Budgeted Recommendation with Delayed Feedback (2405.11417v1)
Abstract: In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.
- Optimal jamming using delayed learning. In: 2014 IEEE Military Communications Conference, IEEE (2014) 1528–1533
- Resourceful contextual bandits. In: Conference on Learning Theory, PMLR (2014) 1109–1134
- Efficient and targeted covid-19 border testing via reinforcement learning. Nature 599(7883) (2021) 108–113
- Bandits with heavy tail. IEEE Transactions on Information Theory 59(11) (2013) 7711–7717
- Multiple identifications in multi-armed bandits. In: International Conference on Machine Learning, PMLR (2013) 258–265
- Nonstochastic bandits with composite anonymous feedback. In: Conference On Learning Theory, PMLR (2018) 750–773
- Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST) 5(4) (2014) 1–34
- Task replication for vehicular cloud: Contextual combinatorial bandit with delayed feedback. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE (2019) 748–756
- Stochastic bandits with arm-dependent delays. In: International Conference on Machine Learning, PMLR (2020) 3348–3356
- Multi-armed bandit for energy-efficient and delay-sensitive edge computing in dynamic networks with uncertainty. IEEE Transactions on Cognitive Communications and Networking (2020)
- Best arm identification in multi-armed bandits with delayed feedback. In: International Conference on Artificial Intelligence and Statistics, PMLR (2018) 833–842
- Contextual bandits for advertising budget allocation. Proceedings of the ADKDD 17 (2020)
- Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th Annual International Conference on Machine Learning. (2009) 401–408
- Online learning under delayed feedback. In: International Conference on Machine Learning, PMLR (2013) 1453–1461
- Nonstochastic multiarmed bandits with unrestricted delays. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R., eds.: Advances in Neural Information Processing Systems. Volume 32., Curran Associates, Inc. (2019)
- Stochastic bandit models for delayed conversions. In: Conference on Uncertainty in Artificial Intelligence. (2017)
- Linear bandits with stochastic delayed feedback. In: International Conference on Machine Learning, PMLR (2020) 9712–9721
- Algorithms with logarithmic or sublinear regret for constrained contextual bandits. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., eds.: Advances in Neural Information Processing Systems. Volume 28., Curran Associates, Inc. (2015)
- Learning in generalized linear contextual bandits with stochastic delays. Advances in Neural Information Processing Systems 32 (2019) 5197–5208