Mean False Error (MFE)
- Mean false error (MFE) is the expected number of false rejections in a family of tests, providing a clear average-case error measure in large-scale experiments.
- The Bonferroni procedure controls PFER by setting a stringent threshold on p-values, ensuring nonparametric control even under dependence.
- MFE is closely related to metrics like PCER, FWER, and FDR, with its stability and low variance offering practical advantages in high-dimensional testing settings.
The mean false error (MFE), also known as the per-family error rate (PFER), is a fundamental metric in multiple hypothesis testing. It quantifies the expected number of false rejections—incorrectly rejecting true null hypotheses—within a family of simultaneous tests. MFE is formally defined as the expected value of the number of false discoveries, providing a clear average-case measure of error in large-scale testing settings, such as those found in genomics and microarray experiments (0709.0366).
1. Formal Definition and Notation
Let denote the total number of hypotheses tested, the true nulls, the number of false rejections, and the total number of rejections. The mean false error (MFE) is defined as
This expectation is taken over the joint distribution of test statistics or -values. The terminology "mean false error" (MFE) is entirely synonymous with PFER in this context (0709.0366).
2. Bonferroni Procedure for PFER Control
The Bonferroni procedure provides a direct approach to controlling the PFER at a user-specified level . It rejects hypothesis if its -value satisfies . Without any assumptions on the dependence structure among -values, the following bound holds:
If all true null -values are uniformly distributed, , leading to
This property illustrates that the Bonferroni rule offers strong and uniform control of the expected number of false discoveries, regardless of dependence among -values (0709.0366).
3. Relationship to Other Error Metrics
The MFE/PFER is interconnected with several other widely used error rates:
| Metric | Definition | Relationship to PFER |
|---|---|---|
| PCER (Per-comparison) | ||
| FWER (Family-wise) | (by Markov) | |
| FDR (False discovery) | FDR is the expectation of the proportion, not the mean number |
FWER controls the probability of any false discovery, while PFER controls the expected number, often resulting in numerically close but conceptually distinct interpretations, especially when expected error counts are low. FDR, in contrast, is concerned with the average proportion of false discoveries among all rejections (0709.0366).
4. Variance and Stability Characteristics
The Bonferroni rule not only tightly bounds the expected number of false rejections but also confers superior stability properties:
- For Bonferroni/PFER, as is a sum of indicators.
- Simulation studies demonstrate that, for matched mean power, the Bonferroni rule yields substantially smaller standard deviation in both the number of true discoveries () and the total number of rejections () relative to the Benjamini–Hochberg (BH) method.
- The variability of both and increases significantly under moderate pairwise correlation among -values (e.g., ), but Bonferroni remains more stable than BH, particularly in high-dimensional settings such as microarray analysis (0709.0366).
5. Impact of Dependence Among p-values
Bonferroni/PFER control is nonparametric: the extremal bound holds for any dependence structure among -values. In contrast, the BH/FDR procedure requires the positive regression dependence on a subset (PRDS) property for rigorous FDR control; with arbitrary dependence, FDR bounds must be relaxed by a harmonic factor. This distinction is critical in applied settings where correlations among test statistics are the rule rather than the exception (0709.0366).
6. Simulation-based Assessment and Empirical Results
In simulations using hypotheses and both independent and exchangeably correlated -values, two critical experimental designs were considered:
- "Equalized FDR": Adjusting and so that their true FDRs coincide, then comparing true discoveries and their standard deviations.
- "Equalized PFER": Matching PFER values and comparing performance.
Findings included near-identical mean power between Bonferroni and BH procedures when FDR is matched, but markedly smaller and for Bonferroni, indicating less variability and greater stability. Scatterplots of vs. showed high correlation (>0.98), yet Bonferroni exhibited lower variability. These results underscore the consistent stability advantage of the Bonferroni rule, particularly as dependence increases (0709.0366).
7. Practical Recommendations in Large-Scale Testing
- In large-scale settings, such as microarray studies, where the permissible average number of false positives per experiment is known (e.g., –5 for genes), the Bonferroni procedure with threshold ensures .
- If the bound is required on the proportion of false positives, the BH method at level is appropriate, though it confers more variability, especially under dependence.
- Bonferroni/PFER is frequently simpler to communicate (“on average we get at most false genes”) and its nonparametric control is robust to any dependence structure.
- For optimization of stability, it is feasible to scan a grid of values and empirically assess or (e.g., via permutation or bootstrap), choosing a threshold corresponding to minimal variance.
- In domains valuing reproducibility, such as genomics, a small PFER (e.g., 1 or 2) can provide a balance between statistical power and stability (0709.0366).