Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contextual Combinatorial Multi-output GP Bandits with Group Constraints (2111.14778v2)

Published 29 Nov 2021 in cs.LG, stat.AP, and stat.ML

Abstract: In federated multi-armed bandit problems, maximizing global reward while satisfying minimum privacy requirements to protect clients is the main goal. To formulate such problems, we consider a combinatorial contextual bandit setting with groups and changing action sets, where similar base arms arrive in groups and a set of base arms, called a super arm, must be chosen in each round to maximize super arm reward while satisfying the constraints of the rewards of groups from which base arms were chosen. To allow for greater flexibility, we let each base arm have two outcomes, modeled as the output of a two-output Gaussian process (GP), where one outcome is used to compute super arm reward and the other for group reward. We then propose a novel double-UCB GP-bandit algorithm, called Thresholded Combinatorial Gaussian Process Upper Confidence Bounds (TCGP-UCB), which balances between maximizing cumulative super arm reward and satisfying group reward constraints and can be tuned to prefer one over the other. We also define a new notion of regret that combines super arm regret with group reward constraint satisfaction and prove that TCGP-UCB incurs $\tilde{O}(\sqrt{\lambda*(K)KT\overline{\gamma}_{T}} )$ regret with high probability, where $\overline{\gamma}_{T}$ is the maximum information gain associated with the set of base arm contexts that appeared in the first $T$ rounds and $K$ is the maximum super arm cardinality over all rounds. We lastly show in experiments using synthetic and real-world data and based on a federated learning setup as well as a content-recommendation one that our algorithm performs better then the current non-GP state-of-the-art combinatorial bandit algorithm, while satisfying group constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Linear stochastic bandits under safety constraints, 2019.
  2. Convex Optimization. Cambridge University Press, March 2004. ISBN 0521833787.
  3. Combinatorial bandits. J. Comput. Syst. Sci., 78(5):1404–1422, 2012.
  4. Cangxiong Chen and Neill D. F. Campbell. Understanding training-data leakage from gradients in neural networks for image classification, 2021.
  5. Contextual combinatorial multi-armed bandits with volatile arms and submodular reward. In Proc. Adv. Neural Inf. Process. Syst., pp.  3247–3256, 2018.
  6. Combinatorial multi-armed bandit: General framework and applications. In Proc. 30th Int. Conf. Mach. Learn., pp.  151–159. PMLR, 2013.
  7. Combinatorial multi-armed bandit with general reward functions. In Proc. Adv. Neural Inf. Process. Syst., pp.  1659–1667, 2016a.
  8. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. J. Mach. Learn. Res., 17(1):1746–1778, 2016b.
  9. Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Netw., 20(5):1466–1478, 2012.
  10. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), December 2015. ISSN 2160-6455. doi: 10.1145/2827872.
  11. Contextual gaussian process bandit optimization. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger (eds.), Adv. in Neural Inf. Process. Syst., volume 24. Curran Associates, Inc., 2011.
  12. A contextual-bandit approach to personalized news article recommendation. In Proc. 19th Int. Conf. World Wide Web, pp.  661–670, 2010.
  13. Contextual combinatorial cascading bandits. In Proc. 33rd Int. Conf. Mach. Learn., pp.  1245–1253. PMLR, 2016.
  14. An optimal algorithm for the thresholding bandit problem. In Maria Florina Balcan and Kilian Q. Weinberger (eds.), Proc. of The 33rd Int. Conf. on Mach. Learn., volume 48, pp.  1690–1698. PMLR, 20–22 Jun 2016.
  15. Contextual multi-armed bandits. In Proc. 13th Int. Conf. Artif. Intell. and Statist., pp. 485–492, 2010.
  16. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Aarti Singh and Jerry Zhu (eds.), Proc. of the 20th Int. Conf. on Artif. Intell. and Statist., volume 54, pp.  1273–1282. PMLR, 20–22 Apr 2017.
  17. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symp. on Sec. and Priv. (SP), pp.  691–706. IEEE, 2019.
  18. Thresholding bandits with augmented UCB. CoRR, abs/1704.02281, 2017.
  19. Contextual combinatorial volatile multi-armed bandit with adaptive discretization. In Proc. 23rd Int. Conf. Artif. Intell. and Statist., volume 108, pp.  1486–1496, 26–28 Aug 2020.
  20. Contextual combinatorial bandits with changing action sets via gaussian processes. CoRR, abs/2110.02248, 2021.
  21. A combinatorial-bandit algorithm for the online joint bid/budget optimization of pay-per-click advertising campaigns. Proc. of the AAAI Conf. on Artif. Intell., 32, 02 2018. doi: 10.1609/aaai.v32i1.11888.
  22. Contextual combinatorial bandit and its application on diversified online recommendation. In Proc. of the 2014 SIAM Int. Conf. on Data Min., pp. 461–469. SIAM, 2014.
  23. Satisficing in multi-armed bandit problems. IEEE Trans. on Auto. Cont., 62(8):3788–3803, 2017. doi: 10.1109/TAC.2016.2644380.
  24. Herbert Robbins. Some aspects of the sequential design of experiments. Bulle. of the Amer. Math. Soc., 58(5):527–535, 1952.
  25. Gaussian process bandits with adaptive discretization. Electron. J. Stat., 12(2):3829–3874, 2018.
  26. Aleksandrs Slivkins. Contextual bandits with similarity information. In Proc. 24th Annu. Conf. Learn. Theory., pp.  679–702, 2011.
  27. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theory, 58(5):3250–3265, 2012.
  28. Michalis K Titsias. Variational learning of inducing variables in sparse gaussian processes. In Proc. 12th Int. Conf. Artif. Intell. and Statist., pp. 567–574, 2009.
  29. Privacy preservation in federated learning: An insightful survey from the gdpr perspective. Comp. & Sec., 110:102402, 2021. ISSN 0167-4048. doi: https://doi.org/10.1016/j.cose.2021.102402.
  30. On information gain and regret bounds in gaussian process bandits. arXiv preprint arXiv:2009.06966, 2020.
  31. Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE Symp. on Adap. Dyn. Prog. and Reinf. Learn. (ADPRL), pp.  191–199, 2013. doi: 10.1109/ADPRL.2013.6615007.
  32. Thompson sampling for combinatorial semi-bandits. volume 80 of Proc. of Mach. Learn. Res., pp.  5114–5122. PMLR, 10–15 Jul 2018.
  33. Best arm identification with safety constraints, 2021.
  34. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. on Inf. Foren. and Sec., 15:3454–3469, 2020. doi: 10.1109/TIFS.2020.2988575.
  35. Upper and lower bounds on the learning curve for gaussian processes. Mach. Learn., 40(1):77–102, 2000.
  36. The scalarized multi-objective multi-armed bandit problem: An empirical study of its exploration vs. exploitation tradeoff. In 2014 Inter. Joint Conf. on Neur. Net. (IJCNN), pp. 2290–2297, 2014. doi: 10.1109/IJCNN.2014.6889390.
  37. The tradeoff between privacy and accuracy in anomaly detection using federated xgboost, 2019.
  38. Xingzhi Zhan. Extremal eigenvalues of real symmetric matrices with entries in an interval. SIAM jour. on matr. analy. and appli., 27(3):851–860, 2005.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sepehr Elahi (6 papers)
  2. Baran Atalar (2 papers)
  3. Sevda Öğüt (1 paper)
  4. Cem Tekin (47 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.