Papers
Topics
Authors
Recent
2000 character limit reached

False Discovery Rate Control

Updated 8 December 2025
  • False Discovery Rate (FDR) is the expected ratio of false rejections to total rejections, providing a foundational metric in multiple testing correction.
  • Competition-based methods such as target–decoy and knockoff frameworks use score comparisons to estimate and manage the false discovery proportion effectively.
  • Advanced procedures like FDP-SD and prediction bands offer probabilistic guarantees on the realized FDP, ensuring high-confidence control in practical applications.

False Discovery Rate (FDR) control is the cornerstone of modern multiple testing, designed to bound the expected proportion of false positives among rejected hypotheses. While procedures such as the Benjamini–Hochberg method ensure that the expected false discovery proportion (FDP) does not exceed a nominal level α, the realized FDP in any specific dataset may substantially exceed α. In the context of competition-based FDR control—chiefly, target–decoy competition (TDC) in proteomics and "knockoff"-based variable selection in regression—a combinatorial structure enables robust FDR control, but further tools are needed for high-confidence control of the realized FDP. Methods such as FDP-SD ("FDP step-down") provide probabilistic guarantees on the FDP, closing the gap between average-case and probabilistic control.

1. Formal Definitions and Competition-Based Frameworks

Let mm hypotheses be tested, with RR the number of rejections and VV the number of false rejections (true nulls rejected). The False Discovery Proportion is

FDP=VR      (FDP=0  if  R=0),\mathrm{FDP} = \frac{V}{R} \;\;\; (\mathrm{FDP}=0\;\text{if}\;R=0),

and the False Discovery Rate is its expectation, FDR=E[FDP]\mathrm{FDR} = \mathbb{E}[\mathrm{FDP}]. FDR control at level α\alpha ensures FDRα\mathrm{FDR} \leq \alpha.

In Target–Decoy Competition (TDC) frameworks, each hypothesis (e.g., peptide-spectrum match or feature) is assigned a “target” score ZiZ_i (signal) and a “decoy” score Z~i\tilde Z_i (null), typically constructed via data shuffling or synthetically generating knockoff variables. For each hypothesis:

  • Wi=max{Zi,Z~i}W_i = \max\{Z_i, \tilde Z_i\} is the “winning” score.
  • Li=+1L_i=+1 if Zi>Z~iZ_i > \tilde Z_i (target wins), Li=1L_i=-1 if Zi<Z~iZ_i < \tilde Z_i (decoy wins).

The hypotheses are sorted in decreasing WiW_i. For any kk,

Dk=i=1k1Li=1,Tk=kDk.D_k = \sum_{i=1}^k 1_{L_i = -1}, \qquad T_k = k - D_k.

TDC estimates the FDP among the top kk hypotheses as FDP^(k)=(Dk+1)/Tk\widehat{\mathrm{FDP}}(k) = (D_k + 1)/T_k and reports all target wins up to

kTDC=max{k:FDP^(k)α}.k_\text{TDC} = \max \left\{k : \widehat{\mathrm{FDP}}(k) \leq \alpha \right\}.

Under the key assumption that, for true nulls, LiL_i is i.i.d. Rademacher(±1)(\pm 1) (exchangeability), the procedure guarantees FDR control at level α\alpha (Luo et al., 2020).

In the knockoff framework for linear regression, a synthetic "knockoff" feature matrix X~\tilde X is constructed with exchangeability: null features are indistinguishable from their knockoffs. Each feature/knockoff pair yields statistics (Zi,Z~i)(Z_i, \tilde Z_i), and the entire procedure mirrors the TDC logic (Luo et al., 2020).

2. Limitation of FDR and the Need for High-Confidence FDP Control

While competition-based FDR procedures robustly control the expected value E[FDP]α\mathbb{E}[\mathrm{FDP}] \leq \alpha, they provide no guarantee that the realized FDP in any given discovery list is near α\alpha. Instances where FDP α\gg \alpha can occur with non-negligible probability—a concern for practitioners requiring strong guarantees on false findings within specific results (Luo et al., 2020, Ebadi et al., 2023).

A probabilistic strengthening of FDR control is False Discovery Proportion-exceedance control (FDX):

P(FDP>α)γ,\mathbb{P}(\mathrm{FDP} > \alpha) \leq \gamma,

for a user-specified tolerance γ(0,1)\gamma \in (0,1). While classic FDR procedures do not address FDX, post-hoc prediction bands or step-down methods can provide explicit bounds on the realized FDP with high probability.

3. The FDP-SD Step-Down Procedure

FDP-SD is an adaptation of generalized step-down control for the competition context. Its core innovation is a set of data-dependent bounds that guarantee, with prescribed confidence 1γ1 - \gamma, that the FDP does not exceed α\alpha. The method proceeds as follows (Luo et al., 2020):

Critical values construction: Given the sorted sequence W1...WmW_1 \geq ... \geq W_m, for each ii, FDP-SD computes

  • k(d)=(id)α+1k(d) = \left\lfloor (i-d)\alpha \right\rfloor + 1
  • δi=max{d{1,0,...,i}:BinCDF(k(d)+d,1/2)(d)γ}\delta_{i} = \max \{ d \in \{-1, 0, ..., i\} : \mathrm{BinCDF}(k(d)+d, 1/2)(d) \leq \gamma \}

where BinCDF(n,p)(x)=P[Binomial(n,p)x]\mathrm{BinCDF}(n, p)(x)=\mathbb{P}[\mathrm{Binomial}(n, p) \leq x].

Algorithm:

  1. For each ii0=max{1,(log2(1/γ)1)/α}i \geq i_0 = \max\{1, \lceil (\lceil \log_2(1/\gamma) \rceil - 1)/\alpha \rceil \}, compute δi\delta_i.
  2. For i=i0,...,mi = i_0, ..., m, define kFDP-SDk_\text{FDP-SD} as the largest ii such that DjδjD_j\leq\delta_j for all jij \leq i.
  3. Report all target wins among the top kFDP-SDk_\text{FDP-SD}.

Guarantee: Under exchangeability, the realized FDP among reported discoveries is controlled:

P(FDP>α)γ\mathbb{P}( \mathrm{FDP} > \alpha ) \leq \gamma

(Luo et al., 2020).

The method generalizes to multiple decoys and can accommodate further tuning via parameters cc and λ\lambda for aggressive TDC/knockoff variants.

4. Alternative FDP Prediction Bands: TDC-SB and TDC-UB

Instead of a step-down rule, prediction bands can wrap around any competition-based FDR method to provide a running upper confidence bound Qˉk\bar Q_{k^*} on the realized FDP (Ebadi et al., 2023). The TDC-SB (Standardized Band) and TDC-UB (Uniform Band) procedures consider, for each number of decoy wins dd:

  • NdN_d: number of true-null target-wins prior to the ddth decoy win.
  • The key stochastic upper bound on NdN_d involves augmenting the observed process so that NdN_d is stochastically dominated by a negative-binomial process UdNB(d,R)U_d \sim \mathrm{NB}(d, R), with RR a function of TDC/knockoff parameters.

Two constructions:

  • TDC-SB: Uses normal approximations to produce

ξd=Bd+zΔ1γB(1+B)d\xi_d = B d + z^{1-\gamma}_\Delta \sqrt{B(1+B)d}

where zΔ1γz^{1-\gamma}_\Delta is the 1γ1-\gamma quantile of the standardized process, B=c/(1λ)B = c/(1-\lambda).

  • TDC-UB: Uses the quantile transformation of UdU_d to define

ξd=βd1uγ\xi_d = \beta_d^{1-u_\gamma}

where uγu_\gamma is chosen so P(mindU~duγ)γ\mathbb{P}(\min_d \tilde U_d \leq u_\gamma ) \leq \gamma.

These bounds are then mapped back onto the sorted hypotheses; for any ii, an upper bound Qˉi\bar Q_i on FDP is provided such that P(i:FDPi>Qˉi)γ\mathbb{P}(\exists i: \mathrm{FDP}_i > \bar Q_i) \leq \gamma (Ebadi et al., 2023).

Empirically, both SB and UB bands are much tighter than the Katsevich–Ramdas band, and in practical scenarios (proteomics, GWAS, model-X knockoff regression) the upper FDP bounds are near sharp, especially using UB (Ebadi et al., 2023).

5. Theoretical Underpinnings and Assumptions

The key probabilistic structure underlying competition-based FDR control is:

  • Exchangeability: For each true-null, the distribution of (Zi,Z~i)(Z_i, \tilde Z_i) is invariant to permutation, so LiL_i is equally likely +1+1 or 1-1 and independent across hypotheses.
  • Independence: Labels LiL_i for true nulls are independent, and their distribution does not depend on the scores of false-nulls or their own winning scores (Luo et al., 2020, Ebadi et al., 2023).

These assumptions support the exact calibration of binomial or negative-binomial upper bounding distributions, enabling both expectation control (FDR) and tail probability control (FDX/prediction bands).

6. Practical Implications, Computation, and Extensions

FDP-SD and prediction bands (TDC-SB, TDC-UB) are computationally tractable:

  • Sorting WiW_i requires O(mlogm)O(m \log m), while computation of δi\delta_i or ξd\xi_d can be made O(1)O(1) per ii or dd via precomputed tables or incremental updates.
  • Flexibility to trade off the nominal level α\alpha and the tail probability γ\gamma is explicit, in contrast to basic FDR procedures.

Applications include:

  • Proteomics: direct substitution of TDC final thresholding with FDP-SD for strong FDP control.
  • High-dimensional regression: model-X knockoff pipelines may use the same combinatorial logic, with prediction bands for post-hoc validation.
  • GWAS and simulation studies consistently demonstrate that FDP-SD dominates competing methods (e.g., Katsevich–Ramdas band) in terms of power and tightness of FDP bounds, with only minor sacrifice relative to ordinary FDR control.

Generalizations to multiple decoys and randomization for exact calibration are feasible. Future extensions suggested include the optimization of aggressiveness parameters and exploring improved sharpenings for multi-decoy frameworks (Luo et al., 2020, Ebadi et al., 2023).

7. Historical and Methodological Context

Competition-based FDR control, formalized in proteomics (target–decoy) and generalized in statistics through model-X knockoffs, represents a distinctive approach leveraging explicit null labeling and exchangeability. Standard FDR control operates in expectation; methods such as FDP-SD and its prediction-band analogues elevate the guarantee to the probability that the realized FDP never exceeds a threshold, thus filling a fundamental gap between average-case and post-hoc, reproducible false discovery control. The combination of combinatorial structure, probabilistic null modeling, and analytical prediction bands sets competition-based control apart as a unifying theme for rigorous, interpretable multiple testing (Luo et al., 2020, Ebadi et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to False Discovery Rate (FDR) Control.