SOMMAB: Semi-Overlapping Multi-Armed Bandit

Updated 2 January 2026

SOMMAB is a framework that enables efficient pure-exploration in multi-bandit problems by using semi-overlapping evaluation groups.
It generalizes the GapE algorithm to leverage shared feedback, which decreases overall sample complexity and improves error bounds.
It finds practical use in multi-task learning, federated learning, and multi-agent systems by ensuring candidate consistency via role-swapping.

The semi-overlapping multi-(multi-armed) bandit (SOMMAB) framework provides a pure-exploration model for problems in which multiple entities (such as tasks, clients, or agents) seek to identify their optimal partners, actions, or configurations, based on shared—but typically asymmetric and computationally intensive—evaluations. Distinct from classical multi-armed or multi-bandit formulations, SOMMAB allows a single evaluation to yield feedback for several bandits due to explicit structural overlap among their arms. This makes SOMMAB an efficient abstraction for sequential identification tasks in domains where composite trials inform multiple learning subproblems, including multi-task learning, federated learning, and coalition formation in multi-agent systems (Antos et al., 31 Dec 2025).

1. Model Specification

The SOMMAB model consists of $M$ entities, each of which induces a local pure-exploration bandit problem. Each entity $m$ is associated with a sparse candidate list $C_m \subset 2^{V \setminus\{m\}}$ of admissible support sets, with each $S \in C_m$ treated as a distinct arm for bandit $m$ . Arm rewards are distributed as $\nu_{m,k}$ , independently sampled, bounded on $[0, b]$ , and with unknown mean $\mu_{m,k}$ .

Arms are structured into semi-overlapping evaluation groups $G\subseteq\{(m,k)\}$ , where each group $G = \{(m_1, k_1), ..., (m_r, k_r)\}$ represents sets of arms “pulled together” in a single trial. Pulling a group at cost one yields a sample for each arm in $G$ , with the number of groups denoted by $n$ and the effective samples collected summarized as $\sum_{m, k} T_{m, k}(n) \geq r n$ . For each bandit $m$ , the recommended arm at the conclusion of $n$ trials is $J_m(n) \in \arg\max_{k} \hat{\mu}_{m,k}(n)$ . A strong duality condition on candidate sets enforces that, for any $S \in C_a$ and $b \in S$ , the "role-swapped" set $S' = (S \setminus\{b\}) \cup \{a\}$ must belong to $C_b$ , ensuring structural coherence.

The overlap degree $r$ characterizes the minimum size of group $G$ ; $r=1$ recovers the classical, non-overlapping multi-bandit problem. The semi-overlapping design is the defining feature: each group pull simultaneously updates statistics for several arms across bandits.

2. Generalized GapE Algorithm

The GapE algorithm, originally developed by Gabillon et al. for non-overlapping multi-bandit settings, is generalized for SOMMAB to exploit the structured overlap. The algorithm proceeds as follows (see Algorithm 3.1 in (Antos et al., 31 Dec 2025)):

Initialization: Every arm is pulled $l$ times by repeatedly sampling their full evaluation group, yielding $T_{m,k}=l$ for all $(m, k)$ .
Adaptive Exploration: For rounds $t = lMK+1,\ldots, n$ $t = lM K + 1, \dots, n$ :
- Compute, for each arm $(m, k)$ , a gap-based index
$B_{m,k}(t) = -\hat{\Delta}_{m,k}(t-1) + b \sqrt{\frac{a}{T_{m,k}(t-1)}}$

where $\hat{\Delta}_{m,k}(t)$ is the empirical gap and $a$ is the exploration parameter. - Select the arm with maximal index, and pull its associated semi-overlap group $G$ , incrementing $T_{p,j}$ and updating means and gaps for every $(p,j)\in G$ .
Recommendation: Each bandit $m$ outputs $J_m(n) = \arg\max_k \hat{\mu}_{m,k}(n)$ as its recommended arm.

This structure allows each trial to propagate information to several arms and, consequently, several bandits, significantly reducing overall sample complexity.

3. Theoretical Results and Error Bounds

The primary theoretical advance consists of new exponential error bounds for best-arm identification in the SOMMAB setting, extending and sharpening those for classical multi-bandit pure-exploration.

Let $H = \sum_{m=1}^M\sum_{k=1}^{K_m} \frac{b^2}{\Delta_{m,k}^2}$ be the global complexity parameter, where $\Delta_{m,k}= \mu_{m,*} - \mu_{m,k}$ . For any initialization $l\in[1,152]$ and appropriate exploration parameter $a$ , if $n\geq l\sum_mK_m$ , the worst-case error probability satisfies:

$\ell(n) = \max_m \mathbb{P}[J_m(n)\neq k_m^*] \leq 2 (\sum_m K_m) n \exp\{-2ac^2\}$

where the explicit computation of $a, c$ depends on $l, r, H$ as detailed in [(Antos et al., 31 Dec 2025), Theorem 4.2]. For $l=152$ ,

$\ell(n) < 2 (\sum_{m} K_m) n \exp\left(-\frac{r n - \sum_mK_m + 1}{41H - 36}\right)$

which reveals that the exponent’s dependence on $r$ is linear: each group evaluation supplies $r$ samples, so higher overlap yields sharper guarantees per unit cost.

Comparison to prior work demonstrates a factor of $2\times$ – $3.5\times$ improvement in the constant in front of $1/H$ relative to Gabillon et al. (2011), where for $r=1$ , $l=1$ , the exponent was $-(n - MK)/(144H)$ (Antos et al., 31 Dec 2025).

4. Sample Complexity and Budget Trade-offs

Improved error constants translate to practical reductions in sampling costs. To guarantee error probability $\ell(n)\leq \delta$ , sample complexity satisfies

$r n \gtrsim 41 H [\log(2MK n) + \log(1/\delta)] + MK$

The effective number of group pulls $n$ is thus reduced by a factor $r$ compared to non-overlapping MMAB, as each group evaluation aids $r$ arms: $n \approx [H/r] \cdot O(\log(MK/\delta))$ . In typical applications, when $r \approx M$ , SOMMAB approaches the efficiency of a single-bandit problem, despite the multi-bandit structure, representing substantial computational and statistical gains in sparsely-coupled settings (Antos et al., 31 Dec 2025).

5. Applications Across Learning Paradigms

SOMMAB formalizes and unifies several families of learning problems where shared evaluations with asymmetric feedback are intrinsic. The mapping between specific domains and SOMMAB’s primitives is as follows:

Domain	Entity/Bandit Interpretation	Arm Interpretation
Multi-task/Auxiliary-task	Task	Subset of auxiliary tasks
Federated Learning	Client	Set of collaborating clients
Multi-agent Coalition	Agent	Set of partner agents

In multi-task and auxiliary-task learning, group trials correspond to jointly training various target and auxiliary sets; overlap arises as multi-target configurations simultaneously update relevant statistics for all involved tasks.
For federated learning, each entity is a client and arms are coalitions; practical semi-overlap emerges due to joint rounds of communication and computation, and the algorithmic structure informs orchestrated coalition selection at the cloud level.
In multi-agent systems, the model identifies optimal asymmetric help-sets via stochastic cooperative task execution, leveraging the overlap from coalitional execution to improve identification efficiency.

The ability to re-use a single composite evaluation across multiple interlocking bandits is the source of SOMMAB’s statistical and computational savings, as quantified explicitly by the dependence on $r$ in the core bounds (Antos et al., 31 Dec 2025).

6. Structural Properties and Practical Implications

Strong duality and role-swapping enforce that the candidate structures across all entities remain consistent, which is essential for meaningful structural sharing. SOMMAB is especially well-suited to scenarios with sparse candidate lists but high potential for evaluation overlap, making it particularly beneficial in large-scale networked or multi-agent contexts with asymmetric partner selection needs.

A plausible implication is that in environments where practical constraints enforce frequent joint evaluations (due to cost, privacy, or communication limitations), SOMMAB’s sample-complexity benefits may be essential for scalable, data-efficient support network identification.

Further formal definitions, model details, lemmas, proofs, and algorithmic subtleties are provided in [(Antos et al., 31 Dec 2025), Sections 2–4].

PDF Markdown Chat (Pro)

References (1)

Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Semi-Overlapping Multi-(Multi-Armed) Bandit (SOMMAB).

SOMMAB: Semi-Overlapping Multi-Armed Bandit

1. Model Specification

2. Generalized GapE Algorithm

3. Theoretical Results and Error Bounds

4. Sample Complexity and Budget Trade-offs

5. Applications Across Learning Paradigms

6. Structural Properties and Practical Implications

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SOMMAB: Semi-Overlapping Multi-Armed Bandit

1. Model Specification

2. Generalized GapE Algorithm

3. Theoretical Results and Error Bounds

4. Sample Complexity and Budget Trade-offs

5. Applications Across Learning Paradigms

6. Structural Properties and Practical Implications

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research