Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-rank Matrix Bandits with Heavy-tailed Rewards (2404.17709v1)

Published 26 Apr 2024 in stat.ML and cs.LG

Abstract: In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $\Theta*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank matrix bandit with \underline{h}eavy-\underline{t}ailed \underline{r}ewards (LowHTR), where the rewards only have finite $(1+\delta)$ moment for some $\delta \in (0,1]$. By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order $\tilde O(d\frac{3}{2}r\frac{1}{2}T\frac{1}{1+\delta}/\tilde{D}_{rr})$ without knowing $T$, which matches the state-of-the-art regret bound under sub-Gaussian noises~\citep{lu2021low,kang2022efficient} with $\delta = 1$. Moreover, we establish a lower bound of the order $\Omega(d\frac{\delta}{1+\delta} r\frac{\delta}{1+\delta} T\frac{1}{1+\delta}) = \Omega(T\frac{1}{1+\delta})$ for LowHTR, which indicates our LOTUS is nearly optimal in the order of $T$. In addition, we improve LOTUS so that it does not require knowledge of the rank $r$ with $\tilde O(dr\frac{3}{2}T\frac{1+\delta}{1+2\delta})$ regret bound, and it is efficient under the high-dimensional scenario. We also conduct simulations to demonstrate the practical superiority of our algorithm.

Exploring Low-Rank Matrix Bandits with Heavy-Tailed Rewards using LOTUS

Introduction

In the stochastic low-rank matrix bandit setting, a major assumption in the literature is that noise variables are sub-Gaussian. This paper introduces a novel context within this framework: heavy-tailed rewards, specifically rewards that only possess finite (1+δ)(1+\delta) moments, with δ\delta in the range (0,1](0,1]. Named as the LowHTR model, this setup moves beyond the classic assumption, addressing more realistic scenarios where extreme values are more common, such as in finance and recommendation systems.

Contributions and Algorithmic Approach

The principal contributions of this paper are as follows:

  1. The introduction of LOTUS, an efficient method equipped to handle the LowHTR setting, achieving a regret bound that matches the state-of-the-art for the sub-Gaussian case when δ=1\delta = 1.
  2. Comprehensive theoretical bounds including a lower bound showcasing that the proposed algorithm is nearly order-optimal with respect to TT, the time horizon.
  3. An extension to the algorithm that operates without prior knowledge of the rank rr, enhancing its practical utility.
  4. Empirical evaluations demonstrating superior performance of LOTUS, particularly under various heavy-tailed noise conditions.

Key to LOTUS’s strategy is the innovative application of payoff truncation paired with dynamic exploration-exploitation balancing, which adapts seamlessly whether or not TT is known a priori. Furthermore, the paper effectively leverages methods from robust statistics, including a convex relaxation-based estimator for handling heavy-tailed data, contributing a novel analytical approach to the trace regression problem under this setting.

Theoretical Analysis

  • The algorithm achieves a regret bound of O~(d3/2r1/2T1/(1+δ)/D~rr)\tilde{O}(d^{3/2} r^{1/2} T^{1/(1+\delta)}/\tilde{D}_{rr}) where the bound is both nearly dimensionality-independent and simple, emphasizing its practical relevance.
  • A lower bound of Ω(dδ/(1+δ)rδ/(1+δ)T1/(1+δ))\Omega(d^{\delta/(1+\delta)} r^{\delta/(1+\delta)} T^{1/(1+\delta)}) is established, indicating close to optimal performance in TT for LOTUS.

Notable Methods and Extensions

  • LOTUS under Unknown Rank: Making LOTUS operational without knowledge of rr is not trivial and showcases the algorithm’s adaptability. A slightly worse regret bound under this setting is provided, but it remains commendable given the complexity of dealing with unknown ranks.
  • Matrix Estimation: The proposed novel Huber-type estimator for robust recovery of a low-rank matrix under heavy-tailed noise offers significant contributions to statistical estimation theory. The methodology can adapt to scenarios where only weaker assumptions about noise behavior are possible.

Implications and Speculations on Future Work

The introduction of heavy-tailed rewards in the low-rank matrix bandit problem opens several avenues for future research. Primarily, it challenges existing theoretical frameworks and calls for robust algorithm designs that can cope with more variable and unpredictable data distributions. This paper sets a foundational step towards understanding and tackling such variations in stochastic environments with low-rank structures. Future research might focus on deeply understanding the dimensionality effects in LowHTR settings or exploring more complex distributions of heavy-tailed noise, potentially looking into asymmetric or long-range dependent structures.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yue Kang (12 papers)
  2. Cho-Jui Hsieh (211 papers)
  3. Thomas C. M. Lee (34 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com