Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Short-lived High-volume Multi-A(rmed)/B(andits) Testing (2312.15356v1)

Published 23 Dec 2023 in cs.LG and stat.ML

Abstract: Modern platforms leverage randomized experiments to make informed decisions from a given set of items (treatments''). As a particularly challenging scenario, these items may (i) arrive in high volume, with thousands of new items being released per hour, and (ii) have short lifetime, say, due to the item's transient nature or underlying non-stationarity that impels the platform to perceive the same item as distinct copies over time. Thus motivated, we study a Bayesian multiple-play bandit problem that encapsulates the key features of the multivariate testing (ormulti-A/B testing'') problem with a high volume of short-lived arms. In each round, a set of $k$ arms arrive, each available for $w$ rounds. Without knowing the mean reward for each arm, the learner selects a multiset of $n$ arms and immediately observes their realized rewards. We aim to minimize the loss due to not knowing the mean rewards, averaged over instances generated from a given prior distribution. We show that when $k = O(n\rho)$ for some constant $\rho>0$, our proposed policy has $\tilde O(n{-\min {\rho, \frac 12 (1+\frac 1w){-1}}})$ loss on a sufficiently large class of prior distributions. We complement this result by showing that every policy suffers $\Omega (n{-\min {\rho, \frac 12}})$ loss on the same class of distributions. We further validate the effectiveness of our policy through a large-scale field experiment on {\em Glance}, a content-card-serving platform that faces exactly the above challenge. A simple variant of our policy outperforms the platform's current recommender by 4.32\% in total duration and 7.48\% in total number of click-throughs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. Audibert JY, Tsybakov AB (2007) Fast learning rates for plug-in classifiers. The Annals of statistics 35(2):608–633.
  2. Farias VF, Madan R (2011) The irrevocable multiarmed bandit problem. Operations Research 59(2):383–399.
  3. Ghosal S (2001) Convergence rates for density estimation with bernstein polynomials. The Annals of Statistics 29(5):1264–1280.
  4. Greene WH (2003) Econometric analysis (Pearson Education India).
  5. Kallus N, Udell M (2020) Dynamic assortment personalization in high dimensions. Operations Research 68(4):1020–1037.
  6. Kohavi R, Longbotham R (2017) Online controlled experiments and a/b testing. Encyclopedia of machine learning and data mining 7(8):922–929.
  7. Lattimore T, Szepesvári C (2020) Bandit algorithms (Cambridge University Press).
  8. Mao J, Bojinov I (2021) Quantifying the value of iterative experimentation. arXiv preprint:2111.02334 .
  9. McFarland C (2012) Experiment!: Website conversion rate optimization with A/B and multivariate testing (New Riders).
  10. Petrone S, Wasserman L (2002) Consistency of bernstein polynomial posteriors. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(1):79–100.
  11. Pocock SJ (2013) Clinical trials: a practical approach (John Wiley & Sons).
  12. Thomke SH (2020) Experimentation works: The surprising power of business experiments (Harvard Business Press).
  13. Vershynin R (2018) High-dimensional probability: An introduction with applications in data science, volume 47 (Cambridge university press).
  14. Vuleta B (2021) How much data is created every day? URL https://seedscientific.com/how-much-data-is-created-every-day/.
  15. Wasserman L (2006) All of nonparametric statistics (Springer Science & Business Media).
  16. Zhang X, Frazier PI (2021) Restless bandits with many arms: Beating the central limit theorem. arXiv preprint arXiv:2107.11911 .

Summary

We haven't generated a summary for this paper yet.