Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Borda Regret Minimization for Generalized Linear Dueling Bandits (2303.08816v2)

Published 15 Mar 2023 in cs.LG and stat.ML

Abstract: Dueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. We first prove a regret lower bound of order $\Omega(d{2/3} T{2/3})$ for the Borda regret minimization problem, where $d$ is the dimension of contextual vectors and $T$ is the time horizon. To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound $\tilde{O}(d{2/3} T{2/3})$. We also propose an EXP3-type algorithm for the adversarial linear setting, where the underlying model parameter can change at each round. Our algorithm achieves an $\tilde{O}(d{2/3} T{2/3})$ regret, which is also optimal. Empirical evaluations on both synthetic data and a simulated real-world environment are conducted to corroborate our theoretical analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Improved algorithms for linear stochastic bandits. In NIPS.
  2. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 235–256.
  3. Instance-dependent regret bounds for dueling bandits. In Conference on Learning Theory. PMLR.
  4. Preference-based online learning with dueling bandits: A survey. Journal of Machine Learning Research 22 7–1.
  5. Preference-based online learning with dueling bandits: A survey. ArXiv abs/1807.11398.
  6. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining.
  7. Stochastic linear optimization under bandit feedback. In Annual Conference Computational Learning Theory.
  8. Contextual dueling bandits. ArXiv abs/1502.06362.
  9. Pac bounds for multi-armed bandit and markov decision processes. In Annual Conference Computational Learning Theory.
  10. Maxing and ranking with few assumptions. Advances in Neural Information Processing Systems 30.
  11. The limits of maxing, ranking, and preference learning. In International conference on machine learning. PMLR.
  12. Maximum selection and ranking under noisy comparisons. In International Conference on Machine Learning. PMLR.
  13. Improved optimistic algorithms for logistic bandits. In International Conference on Machine Learning. PMLR.
  14. Parametric bandits: The generalized linear case. Advances in Neural Information Processing Systems 23.
  15. Approximate ranking from pairwise comparisons. In International Conference on Artificial Intelligence and Statistics. PMLR.
  16. Sparse dueling bandits. In Artificial Intelligence and Statistics. PMLR.
  17. Scalable generalized linear bandits: Online computation and hashing. Advances in Neural Information Processing Systems 30.
  18. Copeland dueling bandit problem: Regret lower bound, optimal algorithm, and computationally efficient algorithm. In International Conference on Machine Learning. PMLR.
  19. Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028 .
  20. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6 4–22.
  21. Bandit Algorithms. Cambridge University Press.
  22. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning. PMLR.
  23. Collaborative topic regression for online recommender systems: an online and bayesian approach. Machine Learning 106 651–670.
  24. Active ranking without strong stochastic transitivity. Advances in neural information processing systems .
  25. Trueskill 2: An improved bayesian skill rating system.
  26. Dueling bandits: Beyond condorcet winners to general tournament solutions. In NIPS.
  27. On sample complexity upper and lower bounds for exact ranking from noisy comparisons. Advances in Neural Information Processing Systems 32.
  28. Linearly parameterized bandits. Mathematics of Operations Research 35 395–411.
  29. Saha, A. (2021). Optimal algorithms for stochastic contextual preference bandits. Advances in Neural Information Processing Systems 34 30050–30062.
  30. Adversarial dueling bandits. ArXiv abs/2010.14563.
  31. Adversarial dueling bandits. In International Conference on Machine Learning. PMLR.
  32. Clinical online recommendation with subgroup rank feedback. In Proceedings of the 8th ACM conference on recommender systems.
  33. Generic exploration and k-armed voting bandits. In ICML.
  34. Meta-prod2vec: Product embeddings using side-information for recommendation. In Proceedings of the 10th ACM conference on recommender systems.
  35. Billion-scale commodity embedding for e-commerce recommendation in alibaba. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining .
  36. Double thompson sampling for dueling bandits. ArXiv abs/1604.07101.
  37. Adaptive sampling for heterogeneous rank aggregation from noisy pairwise comparisons. In International Conference on Artificial Intelligence and Statistics. PMLR.
  38. The k-armed dueling bandits problem. J. Comput. Syst. Sci. 78 1538–1556.
  39. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning.
  40. Beat the mean bandit. In International Conference on Machine Learning.
  41. Crowdsourced top-k algorithms: An experimental evaluation. Proc. VLDB Endow. 9 612–623.
  42. Medlda: maximum margin supervised topic models. J. Mach. Learn. Res. 13 2237–2278.
  43. Copeland dueling bandits. In NIPS.
  44. Relative upper confidence bound for the k-armed dueling bandit problem. ArXiv abs/1312.3393.
  45. Relative upper confidence bound for the k-armed dueling bandit problem. In International conference on machine learning. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yue Wu (339 papers)
  2. Tao Jin (53 papers)
  3. Hao Lou (10 papers)
  4. Farzad Farnoud (29 papers)
  5. Quanquan Gu (198 papers)
Citations (10)