Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 191 tok/s Pro
2000 character limit reached

Fixed-Confidence Piecewise Constant Bandit Problem

Updated 15 July 2025
  • The fixed-confidence piecewise constant bandit problem is a sequential decision task aimed at accurately identifying abrupt change points in a piecewise constant function with high statistical confidence.
  • The methodology concentrates sampling on arms adjacent to potential change points, leveraging inverse squared change magnitudes to minimize the sample complexity.
  • Empirical validations and asymptotically optimal algorithms, like the MCPI variant, demonstrate significant efficiency gains over approaches that do not exploit the piecewise constant structure.

The fixed-confidence piecewise constant bandit problem concerns the rapid, reliable identification of abrupt changes (change points) in a function observed under bandit (sequential, noisy sampling) feedback. In this setting, the function of interest is piecewise constant across a discretized domain, and the objective is to locate change points with high statistical confidence while minimizing the required sample complexity. This problem arises in fields such as quality control, materials science, and sequential experimentation, where detecting structural breaks is more important than merely optimizing function values.

1. Formal Problem Definition

Consider a discretized action space indexed by i=1,2,,Ki = 1, 2, \ldots, K, with each arm ii associated with a mean reward μi\mu_i. The means are assumed piecewise constant: for a set of unknown "change points" {x1,,xN}\{x^*_1, \ldots, x^*_N\}, all indices between two change points share the same mean, and the value jumps at each xjx^*_j. The observation model typically assumes Gaussian rewards for each arm with known variance σ2\sigma^2.

The fixed-confidence identification task is to find the set of change points {x1,,xN}\{x^*_1, \ldots, x^*_N\} such that, with probability at least 1δ1-\delta, the identified set is correct: Pv(x^τ={x1,,xN})>1δ.\mathbb{P}_v(\hat{x}_\tau = \{x^*_1, \ldots, x^*_N\}) > 1-\delta. The stopping time τ\tau should be minimized in expectation. Variants include:

  • Exact-(N, δ): All NN change points must be detected exactly.
  • Any-(N, δ): Output any NN detected change points, which form a subset of the actual changes.

Change magnitudes are denoted Δj=μxjμxj+1\Delta_j = |\mu_{x^*_j} - \mu_{x^*_j+1}| for the jjth change point.

2. Instance-Dependent Lower Bounds and Complexity

The problem admits tight, instance-dependent lower bounds on the sample complexity. For a single change point xx^* of magnitude Δ\Delta, the expected sample complexity for any valid policy must satisfy: E[τ]8σ2Δ2log(14δ)\mathbb{E}[\tau] \geq \frac{8\sigma^2}{\Delta^2} \log\left(\frac{1}{4\delta}\right) For NN change points, the bound generalizes as: E[τ]4σ2log(14δ)j=1N1Δj2\mathbb{E}[\tau] \geq 4\sigma^2 \log\left(\frac{1}{4\delta}\right) \sum_{j=1}^N \frac{1}{\Delta_j^2} Thus, the required number of samples to confidently detect a change grows inversely with the square of the jump size at each change point. This result is derived via change-of-measure arguments and a max-min optimization problem over allocation strategies α\alpha and possible alternatives.

3. Optimal Sampling Strategies and Allocation

The lower bound analysis dictates that, for efficient fixed-confidence identification, sampling should be concentrated adjacent to change points. For the case of a single change: αi={12if i=x or i=x+1, 0otherwise.\alpha^*_i = \begin{cases} \frac{1}{2} &\text{if } i = x^* \text{ or } i = x^*+1, \ 0 &\text{otherwise.} \end{cases} For multiple change points, the optimal allocation places weight adjacent to all changes, proportional to 1/Δj21/\Delta_j^2 for each change jj. Consequently, arms not immediately adjacent to a change are sampled rarely, if at all, in the asymptotic regime. This sharp focus reduces wasted sampling, enabling significant gains over strategies that do not exploit the piecewise constant structure.

4. Asymptotically Optimal Algorithms: Track-and-Stop Variants

Leveraging this allocation, a computationally efficient Track-and-Stop-style algorithm is introduced. In the single change point scenario ("CPI"):

  • At each round, identify the candidate change point as the index aa maximizing μ^a(t)μ^a+1(t)|\hat{\mu}_a(t) - \hat{\mu}_{a+1}(t)|.
  • Allocate arm pulls with proportions $1/2$ each to the two candidate indices.
  • Enforce forced exploration so that every arm is sampled at least t\sqrt{t} times.

A stopping rule is established using a statistic

Z(t)=Tx^t(t)Tx^t+1(t)2(Tx^t(t)+Tx^t+1(t))Δ^t2Z(t) = \frac{T_{\hat{x}_t}(t) T_{\hat{x}_t+1}(t)}{2\left(T_{\hat{x}_t}(t) + T_{\hat{x}_t+1}(t)\right)} \cdot \hat{\Delta}_t^2

and stops when Z(t)Z(t) rises above a threshold

β(t,δ)=log(tγ(K1)δ)+8loglog(tγ(K1)δ)\beta(t, \delta) = \log\left(\frac{t \gamma (K-1)}{\delta}\right) + 8 \log \log\left(\frac{t \gamma (K-1)}{\delta}\right)

where γ\gamma is a fixed constant.

For multiple change points, the MCPI (Multiple Change Point Identification) algorithm iteratively identifies one change at a time and locks candidates when a separation criterion is met. Asymptotically, this algorithm satisfies

lim supδ0E[τ]log(1/δ)8σ2j=1N1Δj2\limsup_{\delta \to 0} \frac{\mathbb{E}[\tau]}{\log(1/\delta)} \leq 8 \sigma^2 \sum_{j=1}^N \frac{1}{\Delta_j^2}

matching the derived lower bounds up to constants.

5. Practical Implementation and Computational Efficiency

The explicit nature of the MCPI algorithm ensures low computational overhead. Given the structure of the objective, the estimator and allocation computation can be performed in closed form without online optimization or large-scale likelihood maximization, unlike in generic Track-and-Stop for arbitrary models.

The forced exploration mechanism provides robustness, preventing premature convergence to suboptimal candidates, especially in early rounds when empirical means are unreliable.

Regarding stop thresholds, logarithmic scaling in both tt and 1/δ1/\delta ensures statistically sound early stopping while avoiding over-sampling once enough evidence is accumulated.

6. Empirical Validation

Experiments on synthetic domains support both the theoretical findings and the practical efficiency of MCPI. For cases with one or several change points, average stopping times scale in parallel with theoretical lower bounds as a function of log(1/δ)\log(1/\delta). Competing methods that do not exploit the known piecewise constant structure require substantially more samples for the same confidence, particularly when changes are small (small Δj\Delta_j).

For example, in environments with

μ=(2,2,2,2,2,2,1,1,1)\mu = (2, 2, 2, 2, 2, 2, 1, 1, 1)

MCPI achieves stopping times tightly tracking the bound

E[τ]8σ2Δ2log(14δ)\mathbb{E}[\tau] \geq \frac{8 \sigma^2}{\Delta^2} \log\left(\frac{1}{4\delta}\right)

across a broad range of target confidences.

7. Connections, Extensions, and Context

The theoretical underpinnings and algorithmic designs build directly on foundational results for best-arm identification and pure exploration with fixed confidence, notably the optimal allocation and stopping rules from classical Track-and-Stop approaches (Garivier et al., 2016). The contribution in the piecewise constant context lies in exploiting the locality of abrupt changes to achieve optimally localized sampling.

Further, the fixed-confidence approach is complementary to fixed-budget analyses of piecewise constant bandits (Lazzaro et al., 22 Jan 2025), which provide non-asymptotic guarantees in the finite-sample regime, and to change-point detection mechanisms in piecewise stationary or restless bandits.

The piecewise constant framework under bandit feedback addresses regimes common in scientific and industrial applications, such as active edge detection or abrupt transition mapping, providing both theoretical foundations and computationally feasible procedures that can be implemented in high-throughput, sequential data acquisition scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this topic yet.