Irreproducible Discovery Rate (IDR)
- IDR is a statistical framework that quantifies reproducibility in replicate high-throughput experiments using a copula-mixture model and local thresholding.
- It adapts false discovery rate principles to assess signal consistency across replicates, enabling rigorous threshold selection for reliable discoveries.
- Applications in ChIP-seq and genomic studies demonstrate IDR’s robust calibration and improved power over conventional p-value and single-replicate methods.
The Irreproducible Discovery Rate (IDR) quantifies the reproducibility of findings arising in replicate high-throughput experiments, such as ChIP-seq peak callers or genomic association studies. Conceived as an analog to the false discovery rate (FDR), IDR directly adapts multiple testing principles to the problem of identifying signals consistent across experimental repeats, modeling reproducibility as an empirical measure derived from a copula-mixture model fit to the ranked signal lists produced by replicates. The IDR framework supports principled threshold selection, graphical analysis of replicate agreement, and empirical reproducibility assessment while accommodating arbitrary marginal scoring scales for each replicate (Li et al., 2011, Wei et al., 2013).
1. Foundation and Formal Analogy to FDR
IDR is motivated by the deficiencies of scalar agreement metrics (e.g., correlation, overlap counts) for quantifying replicate consistency in high-throughput data. Drawing inspiration from FDR frameworks, which model test statistics as a mixture of null and alternative distributions to estimate the local false discovery rate,
the IDR approach reinterprets each signal as arising from an irreproducible () or reproducible () process. Let denote the scores/peak heights for signal in two replicates, and be their probability-integral transforms under the respective marginal empirical distributions, so that marginally.
IDR is then defined at the level of rejection regions as
with local IDR for a particular observed pair 0 given by
1
These expressions are directly analogous to the local FDR and overall FDR in multiple testing scenarios, replacing the notion of a null with “irreproducible” signals and of an alternative with “reproducible” ones (Li et al., 2011).
2. Copula Mixture Model of Replicate Rankings
To capture the dependence structure between replicates, the distribution of 2 is modeled as a two-component copula mixture: 3 where
- 4 is typically the independence copula (5), representing noise or signals that do not agree,
- 6 is a bivariate (Gaussian) copula density with positive correlation 7, representing reproducible signals.
Latent variables 8 are modeled as jointly normal, with parameters 9 for 0. The marginal transformation 1 maps these to unit interval margins, combining the reproducible and irreproducible components in the mixture proportion 2, 3, generated as 4. This construction enables flexible modeling of signal consistency even under unknown or noncomparable replicate scoring scales (Li et al., 2011, Wei et al., 2013).
3. Parameter Estimation and Thresholding
Model fitting proceeds via an EM-type algorithm, integrating empirical marginal transformation into the iterative procedure:
- Pseudo-data: At each EM round, replicate values 5 are ranked to uniform scores, then mapped to latent normal space via the mixture marginal 6 using the current parameter estimates.
- E-step: Posterior mixture responsibilities 7 are updated based on the Gaussian copula densities.
- M-step: Parameters 8 are updated via weighted empirical means and variances.
The process alternates E and M steps, updating pseudo-data at each iteration, and typically converges reliably. Once fit, local IDR values 9 are computed. Signals are then ranked by increasing local IDR, and for each 0,
1
Threshold selection proceeds by choosing the largest 2 so that 3 for a chosen level 4 (e.g., 5), analogous to the Benjamini-Hochberg step-up for FDR control (Li et al., 2011).
4. Assessment via Correspondence Curves
To visualize loss of reproducibility as a function of rank, the correspondence curve is defined as
6
In the population, 7. For perfect dependence, 8; for independence, 9. The derivative 0 highlights breakpoints at which replicate agreement abruptly drops, and thus helps to guide appropriate cutoff selection for calling reproducible signals. This graphical assessment complements the formal mixture modeling and thresholding procedures of IDR (Li et al., 2011).
5. Model Extensions: Survival Copula Mixture
The original IDR framework is limited to loci appearing in both replicate lists ("overlap-only"), which can lead to overestimated reproducibility when overlap is small but concordance within the overlap is high. The survival copula mixture model (SCOP) reformulates the two-list comparison as a bivariate survival problem, allowing for the inclusion of censored (i.e., non-overlapping) loci:
- Loci unique to one list are treated as right-censored observations, with truncation at the cutoff of the other list.
- For each locus, observed data 1 is used in a mixture likelihood, combining survival and density contributions according to censoring pattern.
- The model fits marginal survival curves via a weighted Kaplan–Meier estimate within each mixture component and updates the copula parameters accordingly via EM.
After convergence, a local survival-IDR can be computed for all loci, including those absent from either list. This approach restores power for IDR estimation and corrects the misleadingly low irreproducibility estimates produced by the overlap-only approach, especially as overlap decreases (Wei et al., 2013).
6. Empirical Performance and Applications
Simulation studies demonstrate the calibration and efficacy of IDR-based thresholding: local IDR thresholds are well calibrated (IDR at 2 nominal matches empirical irreproducible-call proportions) except in the presence of complex artifact structure, and IDR-based call sets trade off between true and false discoveries more favorably than single-replicate 3-values or conventional p-value combination methods. In real ChIP-seq applications, IDR reproducibility profiles depend strongly on the quality of the underlying peak callers: high-reproducibility callers yield 4, 5, and thousands of reproducible calls at IDR 6, with IDR-selected peaks exhibiting strong enrichment for high-confidence motif occurrences (Li et al., 2011).
Use of the survival-copula (SCOP) model further rectifies underestimation of irreproducibility in cases where overlap is small. For ENCODE replicates with ~20–40% overlap, overlap-only IDR yielded 7 irreproducibility, while SCOP analysis indicated 8, aligning better with empirical presence of irreproducible loci (Wei et al., 2013).
7. Limitations, Theoretical Properties, and Practical Considerations
Theoretical analysis demonstrates that, under correct model specification and i.i.d. sampling, step-up selection by local IDR is asymptotically optimal for maximizing reproducible discovery yield at controlled IDR levels (in the sense of Sun & Cai, 2007). The rank-based estimation grants near-parametric efficiency under continuous marginals.
Limitations include:
- Replicate independence: systemic biases can generate artifactual reproducibility.
- Signal dependence: most models assume i.i.d. signals; spatial or other correlations among signals can reduce accuracy but empirical performance is robust unless dependence is strong.
- Model misspecification: two-component mixtures may misclassify signals if genuine reproducibility spans more than two strata; 9-component models can be fitted but complicate interpretation.
- Small-n scenarios: poor mixture separation or small sample sizes can impair convergence or result in misassigned components; careful assessment of model fit and estimated parameters is essential.
Empirical application supports IDR as a principled, scale-free, statistically grounded method for quantifying and controlling reproducibility in high-throughput experiments (Li et al., 2011, Wei et al., 2013).