Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Budgeted Recommendation with Delayed Feedback (2405.11417v1)

Published 19 May 2024 in cs.LG

Abstract: In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Optimal jamming using delayed learning. In: 2014 IEEE Military Communications Conference, IEEE (2014) 1528–1533
  2. Resourceful contextual bandits. In: Conference on Learning Theory, PMLR (2014) 1109–1134
  3. Efficient and targeted covid-19 border testing via reinforcement learning. Nature 599(7883) (2021) 108–113
  4. Bandits with heavy tail. IEEE Transactions on Information Theory 59(11) (2013) 7711–7717
  5. Multiple identifications in multi-armed bandits. In: International Conference on Machine Learning, PMLR (2013) 258–265
  6. Nonstochastic bandits with composite anonymous feedback. In: Conference On Learning Theory, PMLR (2018) 750–773
  7. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST) 5(4) (2014) 1–34
  8. Task replication for vehicular cloud: Contextual combinatorial bandit with delayed feedback. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE (2019) 748–756
  9. Stochastic bandits with arm-dependent delays. In: International Conference on Machine Learning, PMLR (2020) 3348–3356
  10. Multi-armed bandit for energy-efficient and delay-sensitive edge computing in dynamic networks with uncertainty. IEEE Transactions on Cognitive Communications and Networking (2020)
  11. Best arm identification in multi-armed bandits with delayed feedback. In: International Conference on Artificial Intelligence and Statistics, PMLR (2018) 833–842
  12. Contextual bandits for advertising budget allocation. Proceedings of the ADKDD 17 (2020)
  13. Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th Annual International Conference on Machine Learning. (2009) 401–408
  14. Online learning under delayed feedback. In: International Conference on Machine Learning, PMLR (2013) 1453–1461
  15. Nonstochastic multiarmed bandits with unrestricted delays. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R., eds.: Advances in Neural Information Processing Systems. Volume 32., Curran Associates, Inc. (2019)
  16. Stochastic bandit models for delayed conversions. In: Conference on Uncertainty in Artificial Intelligence. (2017)
  17. Linear bandits with stochastic delayed feedback. In: International Conference on Machine Learning, PMLR (2020) 9712–9721
  18. Algorithms with logarithmic or sublinear regret for constrained contextual bandits. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., eds.: Advances in Neural Information Processing Systems. Volume 28., Curran Associates, Inc. (2015)
  19. Learning in generalized linear contextual bandits with stochastic delays. Advances in Neural Information Processing Systems 32 (2019) 5197–5208

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com