Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Misalignment, Learning, and Ranking: Harnessing Users Limited Attention (2402.14013v1)

Published 21 Feb 2024 in cs.LG and cs.DS

Abstract: In digital health and EdTech, recommendation systems face a significant challenge: users often choose impulsively, in ways that conflict with the platform's long-term payoffs. This misalignment makes it difficult to effectively learn to rank items, as it may hinder exploration of items with greater long-term payoffs. Our paper tackles this issue by utilizing users' limited attention spans. We propose a model where a platform presents items with unknown payoffs to the platform in a ranked list to $T$ users over time. Each user selects an item by first considering a prefix window of these ranked items and then picking the highest preferred item in that window (and the platform observes its payoff for this item). We study the design of online bandit algorithms that obtain vanishing regret against hindsight optimal benchmarks. We first consider adversarial window sizes and stochastic iid payoffs. We design an active-elimination-based algorithm that achieves an optimal instance-dependent regret bound of $O(\log(T))$, by showing matching regret upper and lower bounds. The key idea is using the combinatorial structure of the problem to either obtain a large payoff from each item or to explore by getting a sample from that item. This method systematically narrows down the item choices to enhance learning efficiency and payoff. Second, we consider adversarial payoffs and stochastic iid window sizes. We start from the full-information problem of finding the permutation that maximizes the expected payoff. By a novel combinatorial argument, we characterize the polytope of admissible item selection probabilities by a permutation and show it has a polynomial-size representation. Using this representation, we show how standard algorithms for adversarial online linear optimization in the space of admissible probabilities can be used to obtain a polynomial-time algorithm with $O(\sqrt{T})$ regret.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Assortment optimization under a multinomial logit model with position bias and social influence. 4OR, 14(1):57–75, 2016.
  2. George Ainslie. Specious reward: a behavioral theory of impulsiveness and impulse control. Psychological bulletin, 82(4):463, 1975.
  3. Display optimization for vertically differentiated locations under multinomial logit preferences. Management Science, 67(6):3519–3550, 2021.
  4. Assortment optimization under consider-then-choose choice models. Management Science, 67(6):3368–3386, 2021.
  5. Sequential submodular maximization and applications to ranking an assortment of products. Operations Research, 2022.
  6. Minimax policies for adversarial and stochastic bandits. In COLT, volume 7, pages 1–122, 2009.
  7. Regret in online combinatorial optimization. Mathematics of Operations Research, 39(1):31–45, 2014.
  8. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science, pages 322–331. IEEE, 1995.
  9. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235–256, 2002a.
  10. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002b.
  11. Contextual inverse optimization: offline and online learning. arXiv preprint arXiv:2106.14015, 2021.
  12. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1–122, 2012.
  13. Feature-based dynamic pricing. Management Science, 66(11):4921–4943, 2020.
  14. Product ranking on online platforms. Management Science, 68(6):4024–4041, 2022.
  15. Oracle-efficient online learning and auction design. Journal of the ACM (JACM), 67(5):1–57, 2020.
  16. Learning to rank an assortment of products. Management Science, 68(3):1828–1848, 2022.
  17. Incentivizing exploration. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 5–22, 2014.
  18. Time discounting and time preference: A critical review. Journal of economic literature, 40(2):351–401, 2002.
  19. Approximation algorithms for product framing and pricing. Operations Research, 68(1):134–160, 2020.
  20. Contextual recommendations and low-regret cutting-plane algorithms. Advances in Neural Information Processing Systems, 34:22498–22508, 2021.
  21. Learning product rankings robust to fake users. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 560–561, 2021.
  22. Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157–325, 2016.
  23. A probabilistic method for inferring preferences from clicks. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 249–258, 2011.
  24. Bayesian exploration with heterogeneous agents. In The world wide web conference, pages 751–761, 2019.
  25. Incentivizing exploration with selective data disclosure. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 647–648, 2020.
  26. Online learning under delayed feedback. In International Conference on Machine Learning, pages 1453–1461. PMLR, 2013.
  27. Playing games with approximation algorithms. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 546–555, 2007.
  28. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  29. Dcm bandits: Learning to rank with multiple clicks. In International Conference on Machine Learning, pages 1215–1224. PMLR, 2016.
  30. Implementing the “wisdom of the crowd”. Journal of Political Economy, 122(5):988–1012, 2014.
  31. Cascading bandits: Learning to rank in the cascade model. In International conference on machine learning, pages 767–776. PMLR, 2015.
  32. Multiple-play bandits in the position-based model. Advances in Neural Information Processing Systems, 29, 2016.
  33. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
  34. Bandit algorithms. Cambridge University Press, 2020.
  35. Toprank: A practical algorithm for online stochastic ranking. Advances in Neural Information Processing Systems, 31, 2018.
  36. Contextual search via intrinsic volumes. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 268–282. IEEE, 2018.
  37. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670, 2010.
  38. Online clustering of contextual cascading bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  39. Contextual dependent click bandit algorithm for web recommendation. In Computing and Combinatorics: 24th International Conference, COCOON 2018, Qing Dao, China, July 2-4, 2018, Proceedings 24, pages 39–50. Springer, 2018.
  40. Bayesian exploration: Incentivizing exploration in bayesian games. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 661–661, 2016.
  41. Bayesian incentive-compatible bandit exploration. Operations Research, 68(4):1132–1161, 2020.
  42. Online learning via offline greedy algorithms: Applications in market design and optimization. Management Science, 2022.
  43. Doing it now or later. American economic review, 89(1):103–124, 1999.
  44. Crowdsourcing exploration. Management Science, 64(4):1727–1746, 2018.
  45. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning, pages 784–791, 2008.
  46. The price of incentivizing exploration: A characterization via thompson sampling and sample complexity. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 795–796, 2021.
  47. Ranked bandits in metric spaces: learning diverse rankings over large document collections. The Journal of Machine Learning Research, 14(1):399–436, 2013.
  48. Aleksandrs Slivkins et al. Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, 12(1-2):1–286, 2019.
  49. Revenue-utility tradeoff in assortment optimization under the multinomial logit model with totally unimodular constraints. Management Science, 67(5):2845–2869, 2021.
  50. Multi-armed bandits with compensation. Advances in Neural Information Processing Systems, 31, 2018.
  51. Online learning to rank in stochastic click models. In International conference on machine learning, pages 4199–4208. PMLR, 2017.
  52. Cascading bandits for large-scale recommendation problems. arXiv preprint arXiv:1603.05359, 2016.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com