Spectral Thompson sampling

Published 15 Apr 2026 in cs.LG and stat.ML | (2604.13739v1)

Abstract: Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d*sqrt(T ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a novel algorithm that leverages graph Laplacian eigenvectors to exploit smooth payoffs in multi-armed bandit problems.
It achieves a cumulative regret bound of O(d√(T log N)) by concentrating on the effective dimension, which is much smaller than the ambient dimension.
Experimental results show that SpectralTS provides superior computational efficiency and performance compared to methods like LinUCB and SpectralUCB.

Spectral Thompson Sampling: Scalable Learning in Bandits with Graph-Structured Smooth Payoffs

Problem Setting and Motivation

The paper presents Spectral Thompson Sampling (SpectralTS), an algorithm for the stochastic multi-armed bandit problem where the expected payoffs over actions exhibit smoothness over an underlying graph structure. In this context, each arm corresponds to a node in a graph, and the smoothness prior captures the assumption that neighboring arms have similar expected rewards. This is a setting particularly relevant in recommender systems (e.g., movies, items, or ads related by similarity graphs) and domains where actions have such relational priors.

Traditional algorithms for bandit problems—such as LinUCB, GP-UCB, or even LinearTS—do not scale efficiently with the number of arms $N$ when $N$ is large, as is typical when the action set is the collection of graph nodes or combinatorial sets. This motivates approaches that exploit structure and, in particular, the effective dimension $d$ induced by the graph Laplacian, rather than the ambient dimension $N$ .

Spectral Thompson Sampling Algorithm

SpectralTS is a spectral variant of Thompson Sampling, leveraging the smoothness structure via the spectral (eigenvector) basis of the graph Laplacian. At each round $t$ :

The agent maintains a Gaussian posterior over the unknown parameter vector $p$ in the spectral (Laplacian) basis.
At each step, the agent samples a parameter instance $f \sim \mathcal{N}(p, v^2 B^{-1})$ , where $B$ is the current precision matrix.
The arm maximizing the inner product $b_i^T f$ (where $b_i$ is the spectral feature vector for arm $N$ 0) is played.
The posterior is updated given the feedback.

Key characteristics include:

Regularization with respect to eigenvalues, favoring smooth functions in the graph sense.
Computational steps per round only involve a single random sample and maximization, unlike UCB-style algorithms, which require explicit calculation of confidence intervals for all arms.

Regret Bound and Computational Complexity

The main theoretical contribution is a finite-time high-probability upper bound on cumulative regret:

Cumulative regret $N$ 1 is upper-bounded by $N$ 2 with high probability, where $N$ 3 is the effective dimension of the graph.
The analysis follows and refines martingale arguments introduced in Thompson sampling literature, carefully adapting covering, anti-concentration, and self-normalized process arguments to the spectral (graph Laplacian) setting.
The effective dimension $N$ 4 is typically much smaller than $N$ 5 and describes the intrinsic complexity of learning the smooth reward function.
The computational complexity per iteration is quadratic in $N$ 6 (per step: $N$ 7), a substantial improvement over the cubic or worse cost for spectral UCB algorithms which maintain per-arm confidence sets.

The regret bound matches the best known bounds for optimistic (UCB-like) counterparts up to a negligible logarithmic factor for relevant applications, especially given that in natural datasets, $N$ 8 can be several orders of magnitude larger than $N$ 9.

Experimental Validation

Empirical results on both synthetic graphs (using the Barabási-Albert model) and the MovieLens real-world collaborative filtering dataset validate the practical effectiveness of SpectralTS:

SpectralTS matches or slightly outperforms SpectralUCB in terms of regret, especially in the large-action and short-horizon regime ( $d$ 0), which is the relevant operational regime for recommender and advertising systems.
SpectralTS is significantly faster (lower computational time per round) than SpectralUCB, confirming the computational claims.
In contrast, generic linear bandit algorithms (LinUCB, LinearTS) perform poorly, underscoring that the exploitation of graph structure is essential in these settings.

Implications and Future Directions

Practical Implications:

SpectralTS provides a scalable, theoretically justified, and computationally efficient alternative to UCB-type algorithms for large, structured bandit problems. Its reliance on effective dimension rather than the ambient dimension enables new applications in extreme-scale recommender and decision systems where prior structure is encoded as a graph.

Theoretical Implications:

The frequentist regret analysis of a Thompson Sampling algorithm in this structured spectral setting closes an important gap—previous analyses either focused on average-case Bayesian regret or did not fully exploit graph-induced structure. The martingale-based analysis framework could potentially be extended to wider classes of structured and combinatorial bandit settings.

Future Developments:

Key open directions include:

Adapting SpectralTS to contextual and non-stationary environments, where user features or item attributes evolve over time.
Exploring generalization to settings where the graph is dynamic or observed incrementally.
Integrating techniques for efficient eigen-decomposition in the online setting to further reduce computational overhead for massive graphs.
Investigating robustness under model misspecification or where the true reward function deviates from strict graph smoothness assumptions.

Conclusion

Spectral Thompson Sampling expands the toolkit for large-scale bandit learning in graph-structured domains, offering a method that is both statistically and computationally efficient. Theoretical analysis demonstrates that regret depends only on the effective (spectral) dimension, and experiments confirm the practical gains in speed with no loss of performance. SpectralTS therefore constitutes a reliable and tractable approach for structured decision-making problems prevalent in modern information systems.

Reference: "Spectral Thompson sampling" (2604.13739)

Markdown Report Issue