Multi-Armed Bandits with Interference
Abstract: Experimentation with interference poses a significant challenge in contemporary online platforms. Prior research on experimentation with interference has concentrated on the final output of a policy. The cumulative performance, while equally crucial, is less well understood. To address this gap, we introduce the problem of {\em Multi-armed Bandits with Interference} (MABI), where the learner assigns an arm to each of $N$ experimental units over a time horizon of $T$ rounds. The reward of each unit in each round depends on the treatments of {\em all} units, where the influence of a unit decays in the spatial distance between units. Furthermore, we employ a general setup wherein the reward functions are chosen by an adversary and may vary arbitrarily across rounds and units. We first show that switchback policies achieve an optimal {\em expected} regret $\tilde O(\sqrt T)$ against the best fixed-arm policy. Nonetheless, the regret (as a random variable) for any switchback policy suffers a high variance, as it does not account for $N$. We propose a cluster randomization policy whose regret (i) is optimal in {\em expectation} and (ii) admits a high probability bound that vanishes in $N$.
- Beating the adaptive bandit with high probability. In 2009 Information Theory and Applications Workshop, pp. 280–289. IEEE, 2009.
- Thompson sampling for the mnl-bandit. In Conference on learning theory, pp. 76–78. PMLR, 2017.
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards. IEEE Transactions on Automatic Control, 32(11):968–976, 1987.
- Estimating average causal effects under general interference, with application to a social network experiment. 2017.
- The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121–164, 2012.
- Exact p-values for network interference. Journal of the American Statistical Association, 113(521):230–240, 2018.
- Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pp. 322–331. IEEE, 1995.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
- Randomization tests of causal effects under interference. Biometrika, 106(2):487–494, 2019.
- Why marketplace experimentation is harder than it seems: The role of test-control interference. In Proceedings of the fifteenth ACM conference on Economics and computation, pp. 567–582, 2014.
- Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(11), 2013.
- A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008.
- Combinatorial bandits. Journal of Computer and System Sciences, 78(5):1404–1422, 2012.
- Combinatorial multi-armed bandit: General framework and applications. In International conference on machine learning, pp. 151–159. PMLR, 2013.
- Spatial autocorrelation. (No Title), 1973.
- Identification and estimation of treatment and interference effects in observational studies on networks. Journal of the American Statistical Association, 116(534):901–918, 2021.
- Adversarial combinatorial bandits with general non-linear reward functions. In International Conference on Machine Learning, pp. 4030–4039. PMLR, 2021.
- Ionides, E. L. Truncated importance sampling. Journal of Computational and Graphical Statistics, 17(2):295–311, 2008.
- Designs for estimating the treatment effect in networks with interference. 2020.
- Distributed non-stochastic experts. Advances in Neural Information Processing Systems, 25, 2012.
- Efficient learning by implicit exploration in bandit problems with side observations. Advances in Neural Information Processing Systems, 27, 2014.
- The surprising power of online experiments. Harvard business review, 95(5):74–82, 2017.
- Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays. In International Conference on Machine Learning, pp. 1152–1161. PMLR, 2015.
- Combinatorial cascading bandits. Advances in Neural Information Processing Systems, 28, 2015.
- Multiple-play bandits in the position-based model. Advances in Neural Information Processing Systems, 29, 2016.
- Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
- Statistical challenges in online controlled experiments: A review of a/b testing methodology. The American Statistician, pp. 1–15, 2023.
- Bandit algorithms. Cambridge University Press, 2020.
- Leung, M. P. Rate-optimal cluster-randomized designs for spatial interference. The Annals of Statistics, 50(5):3064–3087, 2022.
- Leung, M. P. Network cluster-robust inference. Econometrica, 91(2):641–667, 2023.
- The weighted majority algorithm. Information and computation, 108(2):212–261, 1994.
- Manski, C. F. Identification of treatment response with social interactions. The Econometrics Journal, 16(1):S1–S23, 2013.
- Neu, G. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. Advances in Neural Information Processing Systems, 28, 2015.
- Stoltz, G. Incomplete information and internal regret in prediction of individual sequences. PhD thesis, Université Paris Sud-Paris XI, 2005.
- Thomke, S. Building a culture of experimentation. Harvard Business Review, 98(2):40–47, 2020.
- Estimation of causal peer influence effects. In International conference on machine learning, pp. 1489–1497. PMLR, 2013.
- Graph cluster randomization: Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 329–337, 2013.
- Vovk, V. Aggregating strategies. In Proceedings of 3rd Annu. Workshop on Comput. Learning Theory, pp. 371–383, 1990.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pp. 321–384, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.