Papers
Topics
Authors
Recent
Search
2000 character limit reached

Higher Criticism Statistic

Updated 12 November 2025
  • Higher Criticism Statistic is a method that aggregates and evaluates small p-values to detect sparse, weak signals in high-dimensional data.
  • It leverages both empirical distributions and asymptotic theory to establish optimal detection boundaries in multiple-testing frameworks.
  • Practical applications include multi-stream change-point detection with careful threshold calibration to minimize detection delays.

The higher criticism (HC) statistic is a central tool for large-scale detection of rare and weak signals, particularly within high-dimensional multiple-testing, change-point detection, and signal recovery problems. Introduced by Donoho and Jin, HC quantifies the aggregate excess of small pp-values relative to the null expectation, enabling detection of alternatives where only a small, unknown fraction of features or streams are affected. The statistic has a rich mathematical structure, precise asymptotic theory, and connections to optimal detection boundaries in sparse regimes.

1. Definition and Formulation

Let p1,,pnp_1,\dots, p_n be one-sided pp-values, sorted so that p(1)p(2)p(n)p_{(1)}\leq p_{(2)}\leq \cdots \leq p_{(n)}. The canonical higher criticism statistic is

HCn=max1ininp(i)p(i)(1p(i))/n\mathrm{HC}_n = \max_{1 \leq i \leq n} \frac{\frac{i}{n} - p_{(i)}}{\sqrt{p_{(i)} (1-p_{(i)}) / n}}

In large-scale testing applications, it is customary to restrict ii to 1iα0n1 \leq i \leq \alpha_0 n for some fixed α0<1\alpha_0<1; this restricts attention to the smallest pp-values, which are most informative under sparse alternatives.

HC can also be written functionally in terms of the empirical distribution function F^(x)=n1j=1n1{pjx}\widehat{F}(x) = n^{-1} \sum_{j=1}^n \mathbf{1}\{p_j \leq x\}: HC^(x)=F^(x)xF^(x)(1F^(x))/n\widehat{HC}(x) = \frac{\left|\widehat{F}(x) - x\right|}{\sqrt{\widehat{F}(x)(1-\widehat{F}(x))/n}} and HCn=max1inHC^(p(i))\mathrm{HC}_n^* = \max_{1 \leq i \leq n} \widehat{HC}(p_{(i)}).

In the context of sequential or multi-stream problems, the statistic is applied at each time tt to the set of per-stream pp-values, leading to a sequence {HCt}\{\mathrm{HC}_t^\star\}.

2. Operational Principle and Detection Boundary

HC is designed to optimally detect the presence of a sparse mixture, where an unknown, vanishing fraction of the population exhibits a weak deviation. Under the rare/weak normal means model—Xj(1ϵ)N(0,1)+ϵN(μ,1)X_j \sim (1-\epsilon) N(0,1) + \epsilon N(\mu,1) with ϵ=nβ,μ=2rlogn\epsilon = n^{-\beta}, \mu = \sqrt{2r\log n}—there is a sharp "detection boundary" in the (β,r)(\beta, r) parameter space: ρ(β)={β12,12<β<34 (11β)2,34β<1\rho^*(\beta) = \begin{cases} \beta - \tfrac{1}{2}, & \tfrac{1}{2} < \beta < \tfrac{3}{4} \ (1 - \sqrt{1-\beta})^2, & \tfrac{3}{4} \leq \beta < 1 \end{cases} For r>ρ(β)r > \rho^*(\beta), HC is fully powered—i.e., asymptotic type I plus type II error tends to 0. Below this curve, no test, including HC, is powerful.

In heteroscedastic or multistream settings, the boundary generalizes. Let p=Nβp = N^{-\beta} be the affected stream fraction, μ=2rlogN\mu = \sqrt{2r\log N}, and for post-change variance σ2\sigma^2: ρ(β,σ)={(2σ2)(β12),12<β<1σ24, 0<σ2<2 [1σ1β]2,1σ24β<1, 0<σ2<2 0,12<β<11σ2, σ22 [1σ1β]2,11σ2β<1, σ22\rho^*(\beta, \sigma) = \begin{cases} (2-\sigma^2)(\beta-\tfrac{1}{2}), & \tfrac{1}{2}<\beta<1-\tfrac{\sigma^2}{4},\ 0<\sigma^2<2 \ [1-\sigma\sqrt{1-\beta}]^2, & 1-\tfrac{\sigma^2}{4}\leq\beta<1,\ 0<\sigma^2<2 \ 0, & \tfrac{1}{2}<\beta<1-\tfrac{1}{\sigma^2},\ \sigma^2\geq 2 \ [1-\sigma\sqrt{1-\beta}]^2, & 1-\tfrac{1}{\sigma^2}\leq\beta<1,\ \sigma^2\geq 2 \end{cases} This boundary governs the minimum detection delay in multi-stream fastest change-point detection (Gong et al., 2024).

3. Calculation of Stream-wise pp-values and Multi-stream Aggregation

In multi-stream change-point detection, consider observations Xn,tX_{n,t}, n=1,,Nn=1,\dots, N, tNt \in \mathbb{N}, where under the null Xn,tN(0,1)X_{n,t} \sim N(0,1), and under the alternative a sparse unknown subset of streams undergoes a post-τ\tau shift to N(μ,σ2)N(\mu,\sigma^2). The per-stream pp-values depend on the underlying detection statistic:

  • CUSUM / Likelihood-Ratio (LR) Statistic (μ\mu known, σ=1\sigma=1):

YtLR=max0k<t[(StSk)μ2(tk)]μ,St=s=1tXsY_t^{LR} = \max_{0 \leq k < t} [(S_t - S_k) - \tfrac{\mu}{2}(t-k)]\mu, \qquad S_t = \sum_{s=1}^t X_s

The pp-value is πtLR(x)=P(YtLRxH0)\pi_t^{LR}(x) = \mathbb{P}(Y_t^{LR} \geq x | H_0).

  • Generalized Likelihood-Ratio (GLR) Statistic (μ\mu unknown, σ=1\sigma=1):

YtGLR=maxtw<k<tStSktkY_t^{GLR} = \max_{t-w < k < t} \frac{|S_t - S_k|}{\sqrt{t-k}}

The pp-value is πtGLR(x)=P(YtGLRxH0)\pi_t^{GLR}(x) = \mathbb{P}(Y_t^{GLR} \geq x | H_0).

For each time tt, the set {πn,t}n=1N\{\pi_{n,t}\}_{n=1}^N is constructed, ordered, and HC is applied to aggregate evidence across all streams.

4. Stopping Rule, False Alarm Control, and Detection Delay

The global detection procedure is

T=inf{t1:HCt>bt}T = \inf\{ t \geq 1 : \mathrm{HC}_t^\star > b_t \}

where HCt\mathrm{HC}_t^\star is the HC statistic at time tt over streams n=1,,Nn = 1,\dots, N and btb_t is a threshold to guarantee a desired false-alarm rate (often taken constant).

Threshold Calibration:

  • Under the null, one chooses bt=b(N)b_t = b(N) so that suptP(HCt>bH0)0\sup_t \mathbb{P}(\mathrm{HC}_t^\star > b | H_0) \rightarrow 0 as NN \rightarrow \infty.
  • This can be achieved via Monte Carlo or from the large-sample null theory of HC, which gives asymptotic Gumbel-type distributions.

Detection Delay:

  • When a change occurs at unknown time τ\tau, with p=Nβp = N^{-\beta} affected streams and mean shift μ=2rlogN\mu = \sqrt{2r\log N}, the delay converges in distribution: Tτdρ(β,σ)rT-\tau \stackrel{d}{\rightarrow} \left\lceil \frac{\rho^*(\beta, \sigma)}{r} \right\rceil
  • Under H0H_0, no alarm occurs with probability tending to 1.

Key Theorem ((Gong et al., 2024), Gong–Kipnis–Xie):

  • There exists b(N)b(N) such that (i) P(T<H0)0\mathbb{P}(T<\infty|H_0)\to 0, and (ii) P(Tτ=ρ(β,σ)/rH1)1\mathbb{P}(T-\tau = \lceil \rho^*(\beta, \sigma)/r \rceil | H_1) \to 1.
  • Uniformly over τ\tau, the worst-case expected detection delay satisfies

supτE[(Tτ)Tτ]=ρ(β,σ)/r+o(1)\sup_\tau \mathbb{E}[(T-\tau) | T \geq \tau] = \lceil \rho^*(\beta, \sigma)/r \rceil + o(1)

5. Proof Techniques and Moderate Deviations Analysis

The proof combines:

  • Uniformity under H0H_0: the per-stream pp-values are i.i.d. Uniform(0,1), so the maximal HC is bounded by a threshold with high probability.
  • Under H1H_1: the N1βN^{1-\beta} affected streams yield pp-values exhibiting moderate-deviation (or log-χ2\chi^2) behavior:

2logπn,t=d(σZ+μtτ+1)2(1+op(1))-2\log \pi_{n,t} \overset{d}{=} (\sigma Z + \mu\sqrt{t - \tau + 1})^2 (1 + o_p(1))

where ZZ is standard normal. This drives a localized excess of small pp-values detectable by HC.

  • Classical HC power analysis (Donoho–Jin framework) demonstrates that the detection occurs as soon as r(tτ+1)>ρ(β,σ)r(t-\tau+1) > \rho^*(\beta, \sigma), pinning down the minimal delay.

This approach generalizes to the heteroscedastic case (σ1\sigma \neq 1), accommodating unknown post-change variances.

6. Implementation, Calibration, and Tuning Considerations

Algorithmic Steps:

  1. For each time tt, and each stream nn, compute a change-point detection statistic (Yn,tLRY_{n,t}^{LR} or Yn,tGLRY_{n,t}^{GLR}).
  2. Calculate per-stream pp-values using the exact null distribution.
  3. Collect and sort these pp-values; compute HCt\mathrm{HC}_t^\star using a rank cutoff (e.g., top α0N\alpha_0 N smallest pp-values).
  4. Signal a detected change if HCt>bt\mathrm{HC}_t^\star > b_t.

Threshold btb_t determination:

  • Can be set empirically via Monte Carlo under the null model, or via asymptotic approximations: for large NN, HCN\mathrm{HC}_N is approximately Gumbel, scaling as O(2loglogN)O(\sqrt{2\log\log N}).
  • Limiting null distributions may be slow to set in finite NN; empirical calibration is often preferred for stringent control.

Practical recommendations:

  • For large NN, restrict maximization to iα0Ni \leq \alpha_0 N (e.g., α0=0.2\alpha_0 = 0.2 or $0.5$) to avoid instability from extreme-order pp-values.
  • Under strong dependence or heteroscedastic variance, ensure uniformity of null pp-values holds.
  • Computational cost is O(NlogN)O(N\log N) per time step (due to sorting and scan).

7. Significance, Limitations, and Comparison to Information-Theoretic Bounds

Significance:

  • HC attains the optimal (information-theoretic) detection delay for sparse change-point detection, without requiring knowledge of which streams are affected or the precise value of μ\mu.
  • The approach extends to general settings (unknown variance, weak or moderate signals, heteroscedasticity) as long as streamwise pp-values are exactly or approximately uniform under the null.

Limitations:

  • When the fraction of affected streams pp is not sparse (i.e., β\beta near 0), HC is suboptimal compared to bulk averaging procedures.
  • Under heavy-tailed or serially dependent data, uniformity of pp-values may fail, requiring cautious model checking or adaptation.
  • In the very low-count regime, phase transitions in detectability become non-Gaussian, and HC may require thresholding or cell selection for optimality (see (Chan, 2023)).

Comparison:

  • In the special case σ=1\sigma=1, the HC-based procedure matches the delay lower bound derived in prior work (Chan, 2017), achieving minimax optimality among sequential detectors.
  • The derived phase diagram in (β,r)(\beta, r) precisely coincides with the Donoho–Jin boundary, generalizing the result from mean-shift detection in high-dimensional mean testing to quickest change detection in multi-stream scenarios.

Summary Table: Detection Delay and Boundary

Model Parameterization Detection Delay Detection Boundary ρ\rho^*
p=Nβp=N^{-\beta}, μ=2rlogN\mu=\sqrt{2r\log N}, σ2=1\sigma^2=1 Tτ(ρ(β,1)/r)T-\tau \sim (\rho^*(\beta,1)/r) ρ(β,1)=[β12] (12<β<34)\rho^*(\beta,1) = [\beta-\frac{1}{2}]\ (\frac{1}{2}<\beta<\frac{3}{4})<br> =(11β)2 (34β<1)~=(1-\sqrt{1-\beta})^2\ (\frac{3}{4} \leq \beta<1)
p=Nβp=N^{-\beta}, μ=2rlogN\mu=\sqrt{2r\log N}, σ21\sigma^2\neq1 Tτ(ρ(β,σ)/r)T-\tau \sim (\rho^*(\beta,\sigma)/r) See definition in Section 2

HC for multi-stream change-point detection achieves the theoretical detection delay lower bound under general settings, provided careful calibration and accurate pp-value computation are maintained. This framework is robust, adaptive, and achieves rate-optimal performance without requiring explicit signal localization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Higher Criticism Statistic.