Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 67 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 21 tok/s Pro

GPT-5 High 32 tok/s Pro

GPT-4o 120 tok/s Pro

Kimi K2 166 tok/s Pro

GPT OSS 120B 446 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Fixed-Confidence Piecewise Constant Bandit Problem

Updated 15 July 2025

The fixed-confidence piecewise constant bandit problem is a sequential decision task aimed at accurately identifying abrupt change points in a piecewise constant function with high statistical confidence.
The methodology concentrates sampling on arms adjacent to potential change points, leveraging inverse squared change magnitudes to minimize the sample complexity.
Empirical validations and asymptotically optimal algorithms, like the MCPI variant, demonstrate significant efficiency gains over approaches that do not exploit the piecewise constant structure.

The fixed-confidence piecewise constant bandit problem concerns the rapid, reliable identification of abrupt changes (change points) in a function observed under bandit (sequential, noisy sampling) feedback. In this setting, the function of interest is piecewise constant across a discretized domain, and the objective is to locate change points with high statistical confidence while minimizing the required sample complexity. This problem arises in fields such as quality control, materials science, and sequential experimentation, where detecting structural breaks is more important than merely optimizing function values.

1. Formal Problem Definition

Consider a discretized action space indexed by $i = 1, 2, \ldots, K$ , with each arm $i$ associated with a mean reward $\mu_i$ . The means are assumed piecewise constant: for a set of unknown "change points" $\{x^*_1, \ldots, x^*_N\}$ , all indices between two change points share the same mean, and the value jumps at each $x^*_j$ . The observation model typically assumes Gaussian rewards for each arm with known variance $\sigma^2$ .

The fixed-confidence identification task is to find the set of change points $\{x^*_1, \ldots, x^*_N\}$ such that, with probability at least $1-\delta$ , the identified set is correct: $\mathbb{P}_v(\hat{x}_\tau = \{x^*_1, \ldots, x^*_N\}) > 1-\delta.$ The stopping time $\tau$ should be minimized in expectation. Variants include:

Exact-(N, δ): All $N$ change points must be detected exactly.
Any-(N, δ): Output any $N$ detected change points, which form a subset of the actual changes.

Change magnitudes are denoted $\Delta_j = |\mu_{x^*_j} - \mu_{x^*_j+1}|$ for the $j$ th change point.

2. Instance-Dependent Lower Bounds and Complexity

The problem admits tight, instance-dependent lower bounds on the sample complexity. For a single change point $x^*$ of magnitude $\Delta$ , the expected sample complexity for any valid policy must satisfy: $\mathbb{E}[\tau] \geq \frac{8\sigma^2}{\Delta^2} \log\left(\frac{1}{4\delta}\right)$ For $N$ change points, the bound generalizes as: $\mathbb{E}[\tau] \geq 4\sigma^2 \log\left(\frac{1}{4\delta}\right) \sum_{j=1}^N \frac{1}{\Delta_j^2}$ Thus, the required number of samples to confidently detect a change grows inversely with the square of the jump size at each change point. This result is derived via change-of-measure arguments and a max-min optimization problem over allocation strategies $\alpha$ and possible alternatives.

3. Optimal Sampling Strategies and Allocation

The lower bound analysis dictates that, for efficient fixed-confidence identification, sampling should be concentrated adjacent to change points. For the case of a single change: $\alpha^*_i = \begin{cases} \frac{1}{2} &\text{if } i = x^* \text{ or } i = x^*+1, \ 0 &\text{otherwise.} \end{cases}$ For multiple change points, the optimal allocation places weight adjacent to all changes, proportional to $1/\Delta_j^2$ for each change $j$ . Consequently, arms not immediately adjacent to a change are sampled rarely, if at all, in the asymptotic regime. This sharp focus reduces wasted sampling, enabling significant gains over strategies that do not exploit the piecewise constant structure.

4. Asymptotically Optimal Algorithms: Track-and-Stop Variants

Leveraging this allocation, a computationally efficient Track-and-Stop-style algorithm is introduced. In the single change point scenario ("CPI"):

At each round, identify the candidate change point as the index $a$ maximizing $|\hat{\mu}_a(t) - \hat{\mu}_{a+1}(t)|$ .
Allocate arm pulls with proportions $1/2$ each to the two candidate indices.
Enforce forced exploration so that every arm is sampled at least $\sqrt{t}$ times.

A stopping rule is established using a statistic

$Z(t) = \frac{T_{\hat{x}_t}(t) T_{\hat{x}_t+1}(t)}{2\left(T_{\hat{x}_t}(t) + T_{\hat{x}_t+1}(t)\right)} \cdot \hat{\Delta}_t^2$

and stops when $Z(t)$ rises above a threshold

$\beta(t, \delta) = \log\left(\frac{t \gamma (K-1)}{\delta}\right) + 8 \log \log\left(\frac{t \gamma (K-1)}{\delta}\right)$

where $\gamma$ is a fixed constant.

For multiple change points, the MCPI (Multiple Change Point Identification) algorithm iteratively identifies one change at a time and locks candidates when a separation criterion is met. Asymptotically, this algorithm satisfies

$\limsup_{\delta \to 0} \frac{\mathbb{E}[\tau]}{\log(1/\delta)} \leq 8 \sigma^2 \sum_{j=1}^N \frac{1}{\Delta_j^2}$

matching the derived lower bounds up to constants.

5. Practical Implementation and Computational Efficiency

The explicit nature of the MCPI algorithm ensures low computational overhead. Given the structure of the objective, the estimator and allocation computation can be performed in closed form without online optimization or large-scale likelihood maximization, unlike in generic Track-and-Stop for arbitrary models.

The forced exploration mechanism provides robustness, preventing premature convergence to suboptimal candidates, especially in early rounds when empirical means are unreliable.

Regarding stop thresholds, logarithmic scaling in both $t$ and $1/\delta$ ensures statistically sound early stopping while avoiding over-sampling once enough evidence is accumulated.

6. Empirical Validation

Experiments on synthetic domains support both the theoretical findings and the practical efficiency of MCPI. For cases with one or several change points, average stopping times scale in parallel with theoretical lower bounds as a function of $\log(1/\delta)$ . Competing methods that do not exploit the known piecewise constant structure require substantially more samples for the same confidence, particularly when changes are small (small $\Delta_j$ ).

For example, in environments with

$\mu = (2, 2, 2, 2, 2, 2, 1, 1, 1)$

MCPI achieves stopping times tightly tracking the bound

$\mathbb{E}[\tau] \geq \frac{8 \sigma^2}{\Delta^2} \log\left(\frac{1}{4\delta}\right)$

across a broad range of target confidences.

7. Connections, Extensions, and Context

The theoretical underpinnings and algorithmic designs build directly on foundational results for best-arm identification and pure exploration with fixed confidence, notably the optimal allocation and stopping rules from classical Track-and-Stop approaches (Garivier et al., 2016). The contribution in the piecewise constant context lies in exploiting the locality of abrupt changes to achieve optimally localized sampling.

Further, the fixed-confidence approach is complementary to fixed-budget analyses of piecewise constant bandits (Lazzaro et al., 22 Jan 2025), which provide non-asymptotic guarantees in the finite-sample regime, and to change-point detection mechanisms in piecewise stationary or restless bandits.

The piecewise constant framework under bandit feedback addresses regimes common in scientific and industrial applications, such as active edge detection or abrupt transition mapping, providing both theoretical foundations and computationally feasible procedures that can be implemented in high-throughput, sequential data acquisition scenarios.

PDF Markdown Chat (Pro)

References (2)

Optimal Best Arm Identification with Fixed Confidence (2016)

Fixed-Budget Change Point Identification in Piecewise Constant Bandits (2025)

Follow Topic

Get notified by email when new papers are published related to Fixed-Confidence Piecewise Constant Bandit Problem.