Sequential Diagnosis Benchmark (SDBench)

Updated 1 July 2025

Sequential Diagnosis Benchmark (SDBench) provides formalized frameworks, datasets, and methodologies for rigorously evaluating algorithms performing sequential diagnosis under uncertainty.
At its core, SDBench utilizes a Bayesian model to address the joint problem of detecting an unobservable change in a system's state and identifying the specific new regime or fault type.
SDBench serves as a unified benchmark that subsumes classic problems like change detection and multi-hypothesis testing, applicable across diverse domains such as industrial fault isolation.

The Sequential Diagnosis Benchmark (SDBench) refers to formalized frameworks, datasets, and methodologies for evaluating algorithms and systems that perform sequential diagnosis—a process where evidence is acquired and actions are taken in sequence to efficiently identify the true state (often the fault or cause) of a system under uncertainty. SDBench structures benchmark problems and evaluation criteria to unify and rigorously assess detection, isolation, and identification strategies across diverse domains, including industrial fault isolation, multi-class hypothesis testing, and large-scale combinatorial systems.

1. Bayesian Sequential Diagnosis Model

At the core of SDBench is a rigorous Bayesian formulation of the sequential diagnosis problem. Observations $(X_n)_{n \geq 1}$ are initially drawn from a known distribution $P_0$ , but at an unobservable “disorder time” $\theta$ , the distribution shifts to one of $M$ alternatives, $P_\mu$ ( $\mu \in \{1, \ldots, M\}$ ); both $\theta$ and $\mu$ are unknown. The sequential diagnosis problem is then to:

Detect the change as soon as possible after it occurs,
Identify which new regime $P_\mu$ the process has entered with maximal accuracy.

This setup is embedded in a Bayesian framework, with prior distributions for both the disorder time ( $\theta$ —often a geometric law) and for the regime identity ( $\mu$ ). The expected loss of a strategy $(\tau, d)$ —where $\tau$ is the stopping time and $d$ is the declared regime—includes terms for detection delay, false alarm, and misidentification cost: $R(\delta) = c\,\mathbb{E}[(\tau-\theta)^+] + \mathbb{E}[a_{0d}\mathbf{1}_{\{\tau<\theta\}} + a_{\mu d}\mathbf{1}_{\{\theta \leq \tau < \infty\}}]$ where $c$ is the per-period delay cost, and $a_{0j}$ , $a_{ij}$ are false alarm/misidentification penalties.

The online belief state is the vector of posterior probabilities: $\Pi_n^{(0)} = \mathbb{P}\{\theta > n \mid \mathcal{F}_n\},\quad \Pi_n^{(i)} = \mathbb{P}\{\theta \leq n, \mu = i \mid \mathcal{F}_n\}$ where $\mathcal{F}_n$ is the observation history; these probabilities are recursively updated, forming a sufficient Markov statistic on the probability simplex.

2. Optimal Sequential Decision Strategy

Analyzing the Bayesian sequential change diagnosis problem reduces to an optimal stopping problem on the Markov process of posterior probabilities. The Bayes risk is expressed: $R(\delta) = \mathbb{E}\left[\sum_{n=0}^{\tau-1} c(1-\Pi_n^{(0)}) + \mathbf{1}_{\{\tau < \infty\}} \sum_{j=1}^{M}\mathbf{1}_{\{d=j\}} \sum_{i=0}^M a_{ij} \Pi_\tau^{(i)}\right]$ The dynamic programming equation for the value function $V_0(\pi)$ is: $V_0(\pi) = \min\{h(\pi),\, c(1-\pi_0) + (T V_0)(\pi)\}$ where $h(\pi)$ is the minimal expected terminal loss, and $(T V_0)(\pi)$ is the integrated expected cost-to-go after new observations. The optimal policy is to stop as soon as the posterior vector $\pi_n$ enters the stopping region $\Gamma$ , defined by $V_0(\pi) = h(\pi)$ , and declare the most likely regime based on current posteriors.

For practical implementation, the recursion and integration require discretization of the posterior simplex and precomputation of stopping boundaries, especially for larger $M$ .

3. Numerical Algorithms and Computational Considerations

The numerical scheme for this framework involves the following steps:

State Discretization: The $M$ -simplex of posterior probabilities is discretized, typically on a fine grid.
Value Iteration: The value function is iterated via BeLLMan recursions, typically truncated at a finite horizon $N$ for tractability:

$V^N(\pi) = (M^N h)(\pi),\quad V_0^{N+1}(\pi) = \min\{h(\pi), c(1-\pi_0) + (T V_0^N)(\pi)\}$

Iteration stops when convergence is reached.

Integration: Computing expectations over future observations, with system-specific densities, using the $D(\pi, x)$ transition functions.
Boundary Representation: The computed stopping regions $\Gamma$ are “offline” encoded (e.g., via splines or regression) for efficient online evaluation.

For systems with a moderate number of alternatives, the grid-based approach is feasible. For larger $M$ , employing high-fidelity function approximators or Monte Carlo-based techniques is advised—referencing standard sources in dynamic programming and regression.

4. Geometric Properties of the Optimal Strategy

The diagnosis benchmark highlights the rich geometric structure of the optimal stopping region:

For $M$ alternatives, $\Gamma$ resides in the $M$ -simplex and typically consists of $M$ non-empty, convex, closed, bounded sets $\Gamma^{(j)}$ , with each corresponding to a distinct hypothesized regime.
The geometry—notably, the connectedness/disconnectedness of both stopping and continuation regions—depends on the cost structure and prior parameters.
Visualization for $M=2,3$ (simplex as a triangle/triangle in plane or tetrahedron in 3D) allows for insight into how optimal policies deform with problem parameters, as seen in detailed figures in the original paper.

This geometric insight underpins both theoretical understanding and practical implementation, as offline boundary computation allows for efficient, accurate online deployment.

5. Special Cases and Unified Framework

SDBench’s Bayesian formulation subsumes classic problems:

Bayesian Change Detection (Shiryaev’s Problem): When $M = 1$ , the problem reduces to quickest change detection with standard Bayes risk.
Bayesian Sequential Multi-hypothesis Testing: When $p_0 = 1$ (change occurs at time $0$), detection disappears, and the problem matches Wald’s sequential multi-hypothesis testing (Arrow-Blackwell-Girshick).
The model provides a unified lens connecting previously disparate strands in sequential analysis, allowing direct comparison and informed parameterization for diverse diagnosis contexts.

6. Application: Component Failure Detection with Suspended Animation

The benchmark extends to diagnose systems with multiple concealed components, such as industrial or defense systems experiencing “suspended animation” upon component failure:

Each component fails independently (geometric hazard rate); the system switches from nominal ( $f_0$ ) to a new regime ( $f_\mu$ ), corresponding to the failed component.
The diagnostic goal becomes joint detection (failure occurrence) and identification (localization of the failed component), updating posteriors as observations accrue.
The same Bayesian optimal stopping machinery applies, allowing for dynamically updating beliefs about both timing and type of fault and optimizing intervention strategies accordingly.

The approach generalizes to counting and identifying multiple failures by appropriately encoding the diagnosis variable $\mu$ .

Conclusion

SDBench, as formalized in Bayesian sequential change diagnosis, provides a rigorous, unified, and practically computable framework for the joint detection and identification of abrupt, unobservable changes in stochastic systems. The benchmark enables the systematic evaluation of sequential diagnosis strategies, accommodating classic and modern problems, illuminating geometric properties of optimal policies, and guiding efficient numerical implementation. This both advances theoretical understanding and provides a robust, actionable reference point for developing and benchmarking real-world diagnostic systems across fields such as process control, defense, and automated fault isolation.

PDF Markdown Chat (Upgrade)