Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 102 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 30 tok/s

GPT-5 High 27 tok/s Pro

GPT-4o 110 tok/s

GPT OSS 120B 475 tok/s Pro

Kimi K2 203 tok/s Pro

2000 character limit reached

Online Conformal Testing

Updated 5 September 2025

Online conformal testing is a sequential, distribution-free approach that builds conformal martingales to detect deviations from randomness or exchangeability in real-time data streams.
It converts nonconformity scores into p-values and applies dynamic betting functions to update a martingale, ensuring controlled type I error rates via Ville's inequality.
Extensions include e-testing for direct evidence accumulation, differentiation of concept and label shifts, and practical deployment in cloud-based anomaly detection systems.

Online conformal testing is a sequential, distribution-free methodology for detecting and quantifying departures from key probabilistic assumptions—such as randomness or exchangeability—in both classical and modern statistical machine learning pipelines. It provides a rigorous framework for testing hypotheses about streaming data by constructing conformal martingales: stochastic processes, updated sequentially, that are sensitive to deviations from null hypotheses like independence or permutation invariance. The conformal testing paradigm delivers both finite-sample validity (frequentist control of false alarm rates) and adaptivity, making it applicable to online change detection, drift diagnosis, and real-time monitoring systems.

1. Foundations of Online Conformal Testing

Online conformal testing operates in the context of sequential hypothesis testing, focusing on random or exchangeable data streams. The classical null hypothesis of randomness posits that observations are i.i.d., while the slightly weaker exchangeability assumption requires the joint law to be invariant under permutations of the observations. Online conformal testing recasts these hypothesis tests as a martingale gambling process: at each time step, a conformal prediction framework produces a p-value (or conformity score) from the observed data and then uses a pre-specified "betting" function to update a test martingale (Vovk, 2019).

Formally, conformal martingales are nonnegative processes $(S_n)$ adapted to the filtration of the data stream, with $S_0=1$ , and the property that under the null hypothesis (randomness or exchangeability), $\mathbb{E}[S_{n+1} \mid \mathcal{F}_n] = S_n$ . If the null holds, the process exhibits the martingale property; if violated, the process grows rapidly, providing evidence against the hypothesis.

Conformal test martingales generalize classical sequential change detection (e.g., CUSUM or Shiryaev–Roberts procedures) and underpin online anomaly detection engines, such as Microsoft's Azure Time Series Anomaly Detection module, by flagging when evidence accumulates indicating a change in distribution (Vovk, 2019).

2. Conformal Martingales: Construction and Sequential Testing

The construction of online conformal tests proceeds in several stages (Vovk, 2019, Vovk, 2020):

Nonconformity Score Calculation: For each incoming data point $z_n$ , a nonconformity function $A$ outputs a score $\alpha_n$ that quantifies how "atypical" the point appears relative to the current data sequence.
Conformal Transducer (p-value generation): The nonconformity scores are converted into p-values using permutation-invariant transformations. When exchangeability is assumed, the empirical distribution of these p-values under the null is uniform.
Betting Function and Martingale Update: At each step, a betting function (possibly randomized) is applied to the p-value to "gamble" against the null hypothesis, and the product of these steps forms the conformal martingale $S_n$ .

Classical online change-point detection and anomaly detection can be cast in this conformal framework. Notably, conformal versions of CUSUM or Shiryaev–Roberts procedures can be implemented by raising an alarm when the test martingale crosses a predefined threshold (Vovk, 2019). The key property of validity (control of type I error) is ensured by Ville's inequality, which bounds the probability that the martingale ever exceeds $c$ under the null by $1/c$.

A practical instantiation is found in industrial modules—cloud-based anomaly detection pipelines regularly deploy conformal martingale machinery to provide real-time, valid, and interpretable alarms for distributional changes (Vovk, 2019).

3. Extensions: E-Testing, Concept Shift, and Decomposed Martingales

Recent work has extended the online conformal testing framework in several directions:

Conformal e-Testing: This approach replaces p-values with e-values—non-negative statistics with $\mathbb{E}[E] \leq 1$ under the null—offering flexible, direct accumulation of evidence (Vovk et al., 2020). E-testing enjoys the advantage of not requiring randomization for validity, simplifies test construction, and allows practical procedures such as the conformal CUSUM e-procedure for fast change detection. However, in online settings, simple products of e-values may lack strong martingale validity (Ville's inequality may fail), and efficiency can degrade due to "decay" after distributional shifts.
Concept Shift and Label Shift Detection: By using label-conditional conformal transducers, one can construct exchangeability martingales that are selectively sensitive to concept shift (change in $P(X|Y)$ ), label shift (change in $P_Y$ ), or both. This is operationalized by decomposing the evidence in the conformal martingale into two independent components, facilitating diagnosis of the nature of dataset shift (Vovk, 2020). An explicit product-martingale construction is given, and empirical demonstrations on USPS digits confirm the ability to distinguish drift sources.

These extensions broaden the reach of online conformal testing into fine-grained drift detection, robust monitoring, and adaptive statistical process control.

4. Validity, Efficiency, and Theoretical Guarantees

Online conformal testing is characterized by two primary statistical properties (Vovk, 2019, Vovk et al., 2020):

Validity: Procedures based on conformal martingales provide finite-sample control over type I error (false alarms) via Ville's inequality. For any nonnegative conformal martingale $(S_n)$ under the null, $P(\sup_n S_n \geq c) \leq 1/c$ . In e-testing variants, batch-mode validity is ensured by $\mathbb{E}[E(\cdot)] \leq 1$ for bona fide e-variables, though sequential validity may require multistage or CUSUM-type procedures.
Efficiency: The detection power and speed are governed by the choice of nonconformity measure, the adaptivity of the betting function, and the structure of the data. Efficiency comparisons (e.g., between e-testing and classical conformal martingale testing) reveal potential trade-offs: e-testing may lose strong sequential validity and tend to be less adaptive, while conformal martingales require randomization and a more complex construction to guarantee uniform validity in the online regime (Vovk et al., 2020).

Efficiency is further influenced by design choices for the nonconformity function, as altering its sensitivity can trade off the power for detecting different kinds of shifts (e.g., increasing detection of label shift may blunt detection of concept shift) (Vovk, 2020). Practical procedures—conformal versions of CUSUM or Shiryaev–Roberts—have been analyzed for both type I error control and detection delay.

5. Practical Applications and Industrial Integration

Online conformal testing has significant implications in practical settings:

Real-time Change and Anomaly Detection: Online conformal martingale-based detectors are embedded in cloud services (e.g., Microsoft Azure's Time Series Anomaly Detection module) and industrial platforms, providing mechanisms for automatic drift and faults monitoring (Vovk, 2019).
Adaptive Monitoring: These procedures ensure that machine learning models deployed in dynamic environments remain valid and can quickly flag performance deterioration, prompting retraining or intervention.
Self-diagnosis and Model Monitoring: By decomposing conformal martingales, practitioners can attribute detected drift to specific causes (concept or label shifts), which is foundational for actionable responses in complex pipelines (Vovk, 2020).

The distribution-free nature of conformal testing enables deployment in domains where the data-generating mechanism is uncertain or nonstationary, such as finance, predictive maintenance, and cyber-physical systems.

6. Methodological Considerations and Limitations

Several methodological aspects and limitations frame the usage of online conformal testing:

Randomization: Classical conformal testing relies on randomized p-values to guarantee uniform marginal behavior and validity, whereas conformal e-testing provides direct evidence accumulation without randomization (Vovk et al., 2020). However, the absence of randomization may complicate strong sequential validity.
Adaptivity: Classical test martingales permit adaptivity of the betting function to observed evidence; e-testing procedures typically do not incorporate past outcomes, which may reduce responsiveness to changing conditions.
Martingale Decomposition and Model Reduction: Conformal testing is inextricably linked to the deliberate reduction ("forgetting") of data. Rather than working with the full data filtration, conformal methods operate on reduced summaries (e.g., p-values), which is required for effective hypothesis testing in minimally constrained nulls (such as exchangeability) (Vovk, 10 Feb 2024). This perspective is pivotal for unifying conformal testing with more general martingale-based sequential testing methodologies.

Future directions highlighted include systematic comparison and optimization of nonconformity measures, extension to more general group-invariant structures, assessment of statistical power versus information reduction, and comprehensive treatment of adversarial or highly nonstationary regimes. The paper of the balance between efficiency and validity, especially under model misspecification and real-world constraints, remains central for advancing online conformal testing.

7. Summary Table: Core Components of Online Conformal Testing

Component	Functionality	Notable Reference
Conformal Martingale	Accumulates evidence for/against null sequentially	(Vovk, 2019)
Nonconformity Score	Quantifies atypicality for each new observation	(Vovk, 2019)
Conformal Transducer	Converts scores to p-values (perm.-invariant)	(Vovk, 2020)
Randomization	Ensures uniformity and sequential validity	(Vovk et al., 2020)
e-Testing	Accumulates e-values, simplifies randomization	(Vovk et al., 2020)
Martingale Decomposition	Separates concept and label shift detection	(Vovk, 2020)
Validity (Ville's Inequality)	Controls type I error via maximal martingale bound	(Vovk, 2019)
Efficiency	Detection power and response speed (depends on design)	(Vovk et al., 2020)

This table encapsulates the main components and their roles within the online conformal testing framework, referencing the primary foundational works.

PDF Markdown Chat (Upgrade)

References (4)

Testing randomness (2019)

Testing for concept shift online (2020)

Conformal e-testing (2020)

The power of forgetting in statistical hypothesis testing (2024)