Budgeted Recommendation with Delayed Feedback (2405.11417v1)

Published 19 May 2024 in cs.LG

Abstract: In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.

References (19)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/RecsysPapers/status/1810871574029046143

Budgeted Recommendation with Delayed Feedback (2405.11417v1)

Summary

Related Papers

Tweets