Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 84 tok/s
Gemini 2.5 Pro 61 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 111 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

BanditQ: Fair Bandits with Guaranteed Rewards (2304.05219v3)

Published 11 Apr 2023 in cs.LG and cs.PF

Abstract: Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound (UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their objective of playing the most rewarding arm as frequently as possible while ignoring the rest. In this paper, we consider a fair prediction problem in the stochastic setting with a guaranteed minimum rate of accrual of rewards for each arm. We study the problem in both full-information and bandit feedback settings. Combining queueing-theoretic techniques with adversarial bandits, we propose a new online policy, called BanditQ, that achieves the target reward rates while conceding a regret and target rate violation penalty of at most $O(T{\frac{3}{4}}).$ The regret bound in the full-information setting can be further improved to $O(\sqrt{T})$ under either a monotonicity assumption or when considering time-averaged regret. The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard adversarial MAB problem. The analysis of the BanditQ policy involves a new self-bounding inequality, which might be of independent interest.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Søren Asmussen. Applied probability and queues, volume 2. Springer, 2003.
  2. Bandits with knapsacks. Journal of the ACM (JACM), 65(3):1–55, 2018.
  3. Metric-free individual fairness in online learning. Advances in neural information processing systems, 33:11214–11225, 2020.
  4. Queue up your regrets: Achieving the dynamic capacity region of multiplayer bandits. Advances in Neural Information Processing Systems, 35:837–849, 2022.
  5. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
  6. An online learning approach to network application optimization with guarantee. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pages 2006–2014. IEEE, 2018.
  7. A unifying framework for online optimization with long-term constraints. Advances in Neural Information Processing Systems, 35:33589–33602, 2022.
  8. Controlling polarization in personalization: An algorithmic framework. In Proceedings of the conference on fairness, accountability, and transparency, pages 160–169, 2019.
  9. Prediction, learning, and games. Cambridge university press, 2006.
  10. Fair contextual multi-armed bandits: Theory and experiments. In Conference on Uncertainty in Artificial Intelligence, pages 181–190. PMLR, 2020.
  11. Multi-armed bandits with fairness constraints for distributing resources to human teammates. In Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, pages 299–308, 2020.
  12. Joseph L Doob. Stochastic processes. John Wiley & Sons, 1953.
  13. A sharp inequality for sub-martingales and stopping-times. Astérisque, 157(158):129–145, 1988.
  14. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
  15. Rick Durrett. Probability: theory and examples, volume 49. Cambridge university press, 2019.
  16. Fairness of task allocation in crowdsourcing workflows. Mathematical Problems in Engineering, 2021:1–11, 2021.
  17. Online learning with an unknown fairness metric. Advances in neural information processing systems, 31, 2018.
  18. Cvx: Matlab software for disciplined convex programming, 2011.
  19. Adaptation to the range in k–armed bandits. Journal of Machine Learning Research, 24(13):1–33, 2023.
  20. Karen Hao. Facebook’s ad-serving algorithm discriminates by gender and race. MIT Technology Review, 2019.
  21. Fair algorithms for multi-agent multi-armed bandits. Advances in Neural Information Processing Systems, 34:24005–24017, 2021.
  22. Queue scheduling with adversarial bandit learning. arXiv preprint arXiv:2303.01745, 2023.
  23. Achieving counterfactual fairness for causal bandit. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6952–6959, 2022.
  24. Adversarial bandits with knapsacks. Journal of the ACM, 69(6):1–47, 2022.
  25. Fairness in learning: Classic and contextual bandits. Advances in neural information processing systems, 29, 2016.
  26. Monotone matrices and monotone markov processes. Stochastic Processes and their Applications, 5(3):231–241, 1977.
  27. Second-order quantile methods for experts and combinatorial games. In Conference on Learning Theory, pages 1155–1175. PMLR, 2015.
  28. Bandit algorithms. Cambridge University Press, 2020.
  29. Combinatorial sleeping bandits with fairness constraints. IEEE Transactions on Network Science and Engineering, 7(3):1799–1813, 2019.
  30. David V Lindley. The theory of queues with a single server. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 48, pages 277–289. Cambridge University Press, 1952.
  31. Michael J Neely. Stochastic network optimization with application to communication and queueing systems. Synthesis Lectures on Communication Networks, 3(1):1–211, 2010.
  32. Online convex optimization with time-varying constraints. arXiv preprint arXiv:1702.04783, 2017.
  33. Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
  34. Achieving fairness in the stochastic multi-armed bandit problem. The Journal of Machine Learning Research, 22(1):7885–7915, 2021.
  35. Scale-free adversarial multi armed bandits. In International Conference on Algorithmic Learning Theory, pages 910–930. PMLR, 2022.
  36. Sheldon M Ross. Stochastic processes. John Wiley & Sons, 1995.
  37. Abhishek Sinha. Implementation of the BanditQ policy, 2024. URL https://github.com/abhishek-sinha-tifr/BanditQ_UAI24.
  38. Playing in the dark: No-regret learning with adversarial constraints. arXiv preprint arXiv:2310.18955, 2023.
  39. No-regret algorithms for fair resource allocation. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 48083–48109. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/96842011407c2691ab4eefff48fc864d-Paper-Conference.pdf.
  40. Latanya Sweeney. Discrimination in online ad delivery. Communications of the ACM, 56(5):44–54, 2013.
  41. Erik A Van Doorn. Stochastic monotonicity of birth–death processes. Advances in Applied Probability, 12(1):59–80, 1980.
  42. Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. arXiv preprint arXiv:1309.1541, 2013.
  43. David Williams. Probability with martingales. Cambridge university press, 1991.
  44. Thompson sampling for budgeted multi-armed bandits. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, page 3960–3966. AAAI Press, 2015. ISBN 9781577357384.
  45. Online convex optimization with stochastic constraints. Advances in Neural Information Processing Systems, 30, 2017.
  46. Online convex optimization for cumulative constraints. Advances in Neural Information Processing Systems, 31, 2018.
  47. Stochastic one-sided full-information bandit. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 150–166. Springer, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.