Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-rank Bandits with Latent Mixtures (1609.01508v1)

Published 6 Sep 2016 in cs.LG

Abstract: We study the task of maximizing rewards from recommending items (actions) to users sequentially interacting with a recommender system. Users are modeled as latent mixtures of C many representative user classes, where each class specifies a mean reward profile across actions. Both the user features (mixture distribution over classes) and the item features (mean reward vector per class) are unknown a priori. The user identity is the only contextual information available to the learner while interacting. This induces a low-rank structure on the matrix of expected rewards r a,b from recommending item a to user b. The problem reduces to the well-known linear bandit when either user or item-side features are perfectly known. In the setting where each user, with its stochastically sampled taste profile, interacts only for a small number of sessions, we develop a bandit algorithm for the two-sided uncertainty. It combines the Robust Tensor Power Method of Anandkumar et al. (2014b) with the OFUL linear bandit algorithm of Abbasi-Yadkori et al. (2011). We provide the first rigorous regret analysis of this combination, showing that its regret after T user interactions is $\tilde O(C\sqrt{BT})$, with B the number of users. An ingredient towards this result is a novel robustness property of OFUL, of independent interest.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Aditya Gopalan (53 papers)
  2. Odalric-Ambrym Maillard (48 papers)
  3. Mohammadi Zaki (11 papers)
Citations (27)

Summary

We haven't generated a summary for this paper yet.