Expected Discovery Rate in Testing

Updated 10 September 2025

EDR is defined as the expected proportion of true discoveries among nonnull hypotheses, offering a clear measure of statistical power in multiple testing.
Plug-in and weighted procedures adjust rejection thresholds—using kernel estimates and informative weights—to enhance true discovery rates while maintaining FDR control.
Online algorithms (e.g., LORD, LOND) and frameworks like e-Closure dynamically adapt significance levels, balancing error control with maximized EDR in sequential testing.

Expected Discovery Rate (EDR) is a central concept in the theory and practice of multiple hypothesis testing, representing the expected proportion or number of true discoveries (correct rejections) made by a testing procedure. It provides a quantitative assessment of the statistical power achieved under error control constraints such as the False Discovery Rate (FDR). The technical literature addresses EDR both directly—optimizing power under FDR control—and indirectly—by analyzing the performance (yield of true discoveries) across a variety of adaptive, weighted, and online algorithms.

1. Formal Definition and Relationship to FDR

In the context of multiple hypothesis testing, suppose among $m$ tested hypotheses, $V_m$ is the number of false rejections (type I errors) and $R_m$ is the total number of rejections. The False Discovery Rate is defined as $\mathrm{FDR} = \mathbb{E}\left[\frac{V_m}{R_m \vee 1}\right],$ where the denominator ensures well-definedness in the event of zero rejections.

The Expected Discovery Rate, Editor's term (EDR), is designed to quantify statistical power in this context. In its common formulation,

$\text{EDR} = \mathbb{E}\left[\frac{T_m}{m_1}\right],$

where $T_m$ is the number of true rejections (type II errors avoided), and $m_1$ the number of true alternatives (nonnulls). Maximizing EDR under controlled FDR is a principal objective in the design of multiple testing procedures (Neuvial, 2010, Basu et al., 2015, Nie et al., 2023).

2. Plug-In Procedures and Asymptotic EDR Enhancement

Plug-in approaches estimate the proportion of true null hypotheses and accordingly inflate the rejection threshold, yielding procedures with tighter FDR control and asymptotically higher EDR:

For independent hypotheses, the Benjamini-Hochberg (BH) procedure sets the FDR at $\pi_0\alpha$ , with $\pi_0$ being the true null proportion. The oracle procedure applies BH at $\alpha/\pi_0$ , controlling FDR at $\alpha$ . Since $\pi_0$ is unknown, plug-in methods estimate it, often using kernel-based estimators on the $p$ -value density near 1 (Neuvial, 2010).
The plug-in threshold yields more powerful asymptotics (higher EDR) because a larger fraction of nonnulls is rejected, especially as the number of hypotheses $m \to \infty$ and for a wider range of FDR levels.
The trade-off is a slower convergence rate for the realized FDP, namely, at non-parametric rates $m^{-k/(2k+1)}$ , depending on the regularity $k$ of the $p$ -value density at 1, in contrast to the classical BH which enjoys the parametric rate $m^{-1/2}$ .
In models with well-behaved alternative $p$ -value distributions (e.g., two-sided Gaussian, Laplace, Student’s $t$ ), the plug-in procedure's EDR approaches that of an oracle BH at $\alpha/\pi_0$ for increasing $m$ , often far outperforming the standard BH method, but at the price of greater variability for small $m$ .

3. Weighted Multiple Testing and Decision-Theoretic Optimization

Weighted FDR (wFDR) procedures extend classical FDR control by imposing distinct importance weights ( $b_i$ for true positives, $a_i$ for false discoveries) reflecting external biological or scientific information (Basu et al., 2015). The wFDR criterion is

$\text{wFDR} = \frac{\mathbb{E}\left[\sum_{i=1}^m a_i(1-\theta_i)\delta_i\right]}{\mathbb{E}\left[\sum_{i=1}^m a_i \delta_i\right]},$

with $\delta_i$ indicating rejection of hypothesis $i$ , and $\theta_i$ its status (null vs. nonnull).

The corresponding power metric is the Expected True Positives (ETP):

$\text{ETP} = \mathbb{E}\left[\sum_{i=1}^m b_i \theta_i \delta_i\right].$

Oracle and data-driven procedures are constructed to maximize ETP subject to $wFDR \leq \alpha$ . In practice, this framework allows prioritization of hypotheses (e.g., up-weighting SNPs with relevant prior evidence in GWAS) to achieve increased EDR in informative groups while maintaining stringent error control.

The oracle procedure sorts hypotheses by a value-to-capacity ratio (VCR) and fills the rejection set for $Lfdr_i > \alpha$ according to the available wFDR “budget,” maximizing true discoveries.
The data-driven analog uses estimated local false discovery rates ( $Lfdr_i$ ) and a ranking statistic $R_i$ to find a threshold maximizing ETP while approximately controlling wFDR.
Asymptotic analysis shows that the data-driven procedure achieves nearly the same ETP and wFDR as the oracle as $m \rightarrow \infty$ .

4. Online Testing and Discovery Rate Dynamics

Online FDR-controlling algorithms such as LOND (Levels based On Number of Discoveries), LORD (Levels based On Recent Discovery), and generalized alpha-investing rules adaptively allocate significance thresholds as hypotheses are tested sequentially (Javanmard et al., 2015, Javanmard et al., 2016, Robertson et al., 2018).

LOND and LORD increase EDR by boosting thresholds after successful discoveries, with LORD restarting at higher levels following each rejection. Theoretical guarantees establish nearly linear growth ( $O(n^{1-a})$ or $\Theta(n)$ ) for the number of discoveries (and thus EDR) as $n \to \infty$ , under mild assumptions on the sequence $\beta_i$ .
These methods perform comparably to offline BH in terms of EDR but are applicable when hypotheses arrive over time and full batch information is unavailable.
Adjustments for dependent $p$ -values (e.g., harmonic sum scaling in LOND) preserve FDR control but may dampen power, reducing EDR compared to the idealized independent case.
Empirical studies (in microarray, GWAS, and clinical trial contexts) confirm that the online procedures maintain FDR control and deliver high EDR, especially when a bounded upper limit on $N$ is used (i.e., for finite streams).

5. General Principles and the e-Closure Framework

The e-Closure Principle offers a unifying framework for multiple testing by “closing” over all subsets of hypotheses and simultaneously enforcing expected loss conditions, including both FDR and EDR control (Xu et al., 2 Sep 2025).

Given an e-collection $(e_S)_{S \subseteq [m]}$ and a loss function $f_S(R)$ , the closed procedure selects $R$ if $\alpha \cdot e_S \geq f_S(R)$ for every subset $S$ .
For FDR control, $f_S(R) = |R \cap S| / |R|$ recovers the classical ratio, while for EDR control, choosing $f$ to attach weight to the number of true discoveries allows generalization to "expected discovery rate" or other metrics of interest.
The framework allows post hoc flexibility: after the data are seen, one can select the error metric or nominal level $\alpha$ to be controlled, as all candidate rejection sets returned by the closed procedure meet the requisite expected loss bounds.
The closure principle enables uniform improvements to standard methods (e.g., e-Benjamini–Hochberg, e-Benjamini–Yekutieli) and supports simultaneous error control across many candidate sets, directly translating to more robust and potentially higher EDR via wider valid choices for rejection sets.

6. Asymptotic Limits, Trade-Offs, and Optimality

Recent advances characterize the fundamental limits of simultaneous FDR and EDR (or FNR) control in large-scale settings (Nie et al., 2023):

The optimal trade-off between FDR and EDR cannot generally be achieved by separable decision rules; instead, compound or coordinated rules—sometimes yielding two-point representations via Carathéodory’s theorem—are required even in simple models such as the Gaussian mixture.
For any feasible randomized strategy under an FDR constraint $\mathbb{E}[Z] \leq \alpha$ , the minimal expected “loss” or maximal EDR is achieved by a two-point random variable. This sufficiency result signifies that multiple testing procedures can be reduced (without loss) to decision rules with thresholding at only two points.
In settings where the false discovery proportion (FDP) must be controlled with high probability (rather than expectation), the optimal trade-off for EDR coincides with that for marginal FDR (mFDR).

7. Applications, Implications, and Limitations

The interplay between FDR and EDR is foundational for reproducibility, interpretability, and scientific yield, particularly in genomics, medical studies, and high-dimensional data analysis.

By design, FDR control ensures the average rate of false discoveries is low, indirectly raising EDR by permitting more true discoveries at higher thresholds.
Adaptive, weighted, and online procedures enhance the opportunity for true discoveries (EDR) but introduce trade-offs in terms of statistical variability, convergence rate, and computational overhead.
Plug-in, closure-based, and compound oracle procedures, supported by rigorous asymptotic theory, guarantee that as the scale of testing grows, statistical procedures can approach optimal EDR under stringent error control.

Despite advances, finite-sample fluctuations, dependence among $p$ -values, and issues around the estimation or choice of error rates remain practical limitations affecting EDR. Methodological choices (kernel estimation bandwidth, weighting schemes, significance level allocation) must balance power against the risk of excess false discoveries.

Summary Table: EDR Optimization Across Approaches

Procedure Type	EDR Optimization Mechanism	Trade-off / Limitation
Plug-in Kernel Estimator	Oracle-level FDR, inflated threshold	Nonparametric convergence
Weighted FDR	ETP maximization via informative weights	Weight specification, estimation
Online Testing (LORD/LOND)	Adaptive threshold through feedback	Dependency adjustment, finite-sample power
e-Closure Principle	Uniform control/post hoc flexibility	Computational complexity
Compound Oracle	Two-point strategy, optimal trade-off	Model assumption, implementability

These results collectively establish the theoretical and practical landscape for optimizing the Expected Discovery Rate in large-scale multiple testing, balancing it against strict control of the False Discovery Rate for reliable scientific discovery.