Papers
Topics
Authors
Recent
2000 character limit reached

Mean False Error (MFE)

Updated 8 December 2025
  • Mean false error (MFE) is the expected number of false rejections in a family of tests, providing a clear average-case error measure in large-scale experiments.
  • The Bonferroni procedure controls PFER by setting a stringent threshold on p-values, ensuring nonparametric control even under dependence.
  • MFE is closely related to metrics like PCER, FWER, and FDR, with its stability and low variance offering practical advantages in high-dimensional testing settings.

The mean false error (MFE), also known as the per-family error rate (PFER), is a fundamental metric in multiple hypothesis testing. It quantifies the expected number of false rejections—incorrectly rejecting true null hypotheses—within a family of simultaneous tests. MFE is formally defined as the expected value of the number of false discoveries, providing a clear average-case measure of error in large-scale testing settings, such as those found in genomics and microarray experiments (0709.0366).

1. Formal Definition and Notation

Let mm denote the total number of hypotheses tested, m0m_0 the true nulls, VV the number of false rejections, and RR the total number of rejections. The mean false error (MFE) is defined as

PFER=E[V].\text{PFER} = \mathbb{E}[V].

This expectation is taken over the joint distribution of test statistics or pp-values. The terminology "mean false error" (MFE) is entirely synonymous with PFER in this context (0709.0366).

2. Bonferroni Procedure for PFER Control

The Bonferroni procedure provides a direct approach to controlling the PFER at a user-specified level α\alpha. It rejects hypothesis ii if its pp-value satisfies Piα/mP_i \leq \alpha/m. Without any assumptions on the dependence structure among pp-values, the following bound holds:

E[V]=iTPr{reject Hi}m0(α/m)α.\mathbb{E}[V] = \sum_{i \in T} \mathrm{Pr}\{\text{reject}\ H_i\} \leq m_0 \cdot (\alpha/m) \leq \alpha.

If all true null pp-values are uniformly distributed, Pr{Piα/m}=α/m\mathrm{Pr}\{P_i \leq \alpha/m\} = \alpha/m, leading to

PFER=E[V]=(m0/m)αα.\text{PFER} = \mathbb{E}[V] = (m_0/m)\alpha \leq \alpha.

This property illustrates that the Bonferroni rule offers strong and uniform control of the expected number of false discoveries, regardless of dependence among pp-values (0709.0366).

3. Relationship to Other Error Metrics

The MFE/PFER is interconnected with several other widely used error rates:

Metric Definition Relationship to PFER
PCER (Per-comparison) E[V]/m\mathbb{E}[V]/m PFER=mPCER\text{PFER} = m \cdot \text{PCER}
FWER (Family-wise) Pr{V1}\mathrm{Pr}\{V \geq 1\} FWERPFER\text{FWER} \leq \text{PFER} (by Markov)
FDR (False discovery) E[V/R]\mathbb{E}[V/R] FDR is the expectation of the proportion, not the mean number

FWER controls the probability of any false discovery, while PFER controls the expected number, often resulting in numerically close but conceptually distinct interpretations, especially when expected error counts are low. FDR, in contrast, is concerned with the average proportion of false discoveries among all rejections (0709.0366).

4. Variance and Stability Characteristics

The Bonferroni rule not only tightly bounds the expected number of false rejections but also confers superior stability properties:

  • For Bonferroni/PFER, Var(V)E[V]\mathrm{Var}(V) \leq \mathbb{E}[V] as VV is a sum of indicators.
  • Simulation studies demonstrate that, for matched mean power, the Bonferroni rule yields substantially smaller standard deviation in both the number of true discoveries (SD(V)\mathrm{SD}(V)) and the total number of rejections (SD(R)\mathrm{SD}(R)) relative to the Benjamini–Hochberg (BH) method.
  • The variability of both VV and RR increases significantly under moderate pairwise correlation among pp-values (e.g., ρ=0.4\rho=0.4), but Bonferroni remains more stable than BH, particularly in high-dimensional settings such as microarray analysis (0709.0366).

5. Impact of Dependence Among p-values

Bonferroni/PFER control is nonparametric: the extremal bound E[V](m0/m)y\mathbb{E}[V] \leq (m_0/m) y holds for any dependence structure among pp-values. In contrast, the BH/FDR procedure requires the positive regression dependence on a subset (PRDS) property for rigorous FDR control; with arbitrary dependence, FDR bounds must be relaxed by a harmonic factor. This distinction is critical in applied settings where correlations among test statistics are the rule rather than the exception (0709.0366).

6. Simulation-based Assessment and Empirical Results

In simulations using m=1,255m=1,255 hypotheses and both independent and exchangeably correlated pp-values, two critical experimental designs were considered:

  • "Equalized FDR": Adjusting αBon\alpha_{\text{Bon}} and qBHq_{\text{BH}} so that their true FDRs coincide, then comparing true discoveries and their standard deviations.
  • "Equalized PFER": Matching PFER values and comparing performance.

Findings included near-identical mean power between Bonferroni and BH procedures when FDR is matched, but markedly smaller SD(V)\mathrm{SD}(V) and SD(R)\mathrm{SD}(R) for Bonferroni, indicating less variability and greater stability. Scatterplots of RBonfR_{\text{Bonf}} vs. RBHR_{\text{BH}} showed high correlation (>0.98), yet Bonferroni exhibited lower variability. These results underscore the consistent stability advantage of the Bonferroni rule, particularly as dependence increases (0709.0366).

7. Practical Recommendations in Large-Scale Testing

  • In large-scale settings, such as microarray studies, where the permissible average number of false positives per experiment is known (e.g., y1y \approx 1–5 for 10410^4 genes), the Bonferroni procedure with threshold y/my/m ensures E[V]y\mathbb{E}[V] \leq y.
  • If the bound is required on the proportion of false positives, the BH method at level qq is appropriate, though it confers more variability, especially under dependence.
  • Bonferroni/PFER is frequently simpler to communicate (“on average we get at most yy false genes”) and its nonparametric control is robust to any dependence structure.
  • For optimization of stability, it is feasible to scan a grid of yy values and empirically assess SD(R)\mathrm{SD}(R) or SD(V)\mathrm{SD}(V) (e.g., via permutation or bootstrap), choosing a threshold corresponding to minimal variance.
  • In domains valuing reproducibility, such as genomics, a small PFER (e.g., 1 or 2) can provide a balance between statistical power and stability (0709.0366).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Mean False Error (MFE).