Equal Error Rate (EER) Overview
- Equal Error Rate (EER) is a scalar metric that defines the threshold where a classifier’s false acceptance and false rejection rates are equal, providing a clear measure of discriminatory power.
- EER is computed by iterating over sorted score thresholds to find the point where the False Acceptance Rate equals the False Rejection Rate, making it robust and prevalence-agnostic.
- EER is critical in biometric and security applications, yet its binary focus and insensitivity to calibration necessitate the use of complementary metrics for comprehensive evaluation.
Equal Error Rate (EER) is a scalar metric used to characterize the operational performance of binary classifiers that produce continuous-valued similarity or confidence scores. EER represents the error rate at the unique decision threshold where the probability of misclassifying a positive instance as negative (false rejection) equals the probability of misclassifying a negative instance as positive (false acceptance). Widely adopted as a standardized reference point in biometrics and related fields, EER is both threshold- and prevalence-agnostic, providing a concise summary of a system’s intrinsic discriminatory power under symmetrical error cost assumptions (Ferrer, 2022, Giot et al., 2012, Brümmer et al., 2021).
1. Formal Definition and Error Rate Symmetry
EER is defined in terms of two error rates for a continuous-score binary classifier: the False Acceptance Rate (FAR) and the False Rejection Rate (FRR). For any threshold :
- : the fraction of negative examples classified as positive.
- : the fraction of positive examples classified as negative.
The EER is attained at the threshold where FAR and FRR are equal, or as close as possible:
This threshold corresponds to the operating point on the ROC or DET curve where the two error rates intersect, and formally provides a solution to the symmetric (equal-cost) error minimization problem (Ferrer, 2022, Brümmer et al., 2021, Friedman et al., 2019).
2. Statistical Properties and Theoretical Context
EER is formally independent of class priors, being based exclusively on class-conditional error rates. It is also invariant under monotonic transformations of the scoring function, relying solely on the ranking order of scores rather than their absolute calibration or interpretation as posteriors. This makes EER robust for prevalence-agnostic evaluation—a property critical in scenarios where class distributions vary over time or are unknown at deployment (Ferrer, 2022, Brümmer et al., 2021).
Statistically, EER corresponds to the special case of the Bayes Expected Cost (Bayes EC or Bayes risk) with equal costs for both types of errors and a 0.5 prior, yet in general practice, EER omits explicit cost and prevalence consideration. For a perfectly calibrated system, the minimum Bayes error rate at any prior satisfies the "trapezium bound":
The maximum Bayes error rate across all priors coincides with the EER (Brümmer et al., 2021, Ferrer, 2022).
3. Computation and Algorithmic Considerations
The computation of EER consists of:
- Collecting genuine (positive) and impostor (negative) scores.
- Sorting all unique threshold candidates.
- Computing FAR and FRR at each threshold.
- Determining and assigning EER as the intersection value.
For large datasets, brute-force enumeration over all possible thresholds becomes computationally prohibitive. Fast approximation algorithms exploit the monotonicity of the difference function , implementing divide-and-conquer (polytomous search) to efficiently localize 0:
| Method | Typical Error (%) | Typical Time (ms) | # Comparisons |
|---|---|---|---|
| Brute-force (1000) | 0.20 | 9310 | 1000 |
| Polytomous (p=3) | 0.07–0.30 | 110–139 | 11–14 |
Such approaches provide 8×–80× speedups for EER calculation and are crucial for real-time systems and algorithms requiring frequent EER evaluation (e.g., genetic algorithm-based score fusion) (Giot et al., 2012).
4. Comparative Analysis with Other Metrics
EER serves as a point metric, in contrast to metrics integrating over the full spectrum of thresholds, such as Area Under the ROC Curve (AUC). Whereas AUC measures the global ranking performance of the classifier, EER specifies operational performance at a single, critical, symmetric operating point.
EER ignores calibration, in contrast to Cross-Entropy, Brier Score, and Bayes EC, which evaluate both ranking and the degree to which scores correspond to true probabilities of class membership. While EER remains robust to calibration errors, it may fail to reveal overconfident or poorly calibrated models. Furthermore, EER does not generalize to multiclass classification except by reduction to pairwise class evaluations, a process lacking a unified decision-theoretic foundation (Ferrer, 2022, Brümmer et al., 2021).
5. Practical Application Domains
EER is integral in biometric verification (e.g., fingerprint, face, speaker verification) and security domains, where it is frequently adopted as a benchmark for system comparison under the ISO/IEC 19795-1 standard. In these applications, EER provides a compact summary of the trade-off between security (minimizing FAR) and user convenience (minimizing FRR).
A key engineering question in biometric system design is: "How many independent features of a given temporal persistence are required to achieve a desired EER?" For Gaussian, uncorrelated features, there is a linear relationship between the log-number of features and their intraclass correlation coefficient (ICC) at various EER targets. For example, for target EER = 1% and ICC = 0.8, the required number of features is:
1
This enables rapid, pre-experimental planning of system requirements (Friedman et al., 2019).
6. EER in Algorithmic Fairness and Group Disparity Context
In algorithmic fairness, the "equal error rate" concept is related to group-based error parity, where a classifier must achieve identical false positive and false negative rates across defined subgroups. However, a formal impossibility result shows that, except under specific base-rate conditions, it is not possible to simultaneously achieve both calibration (predictive parity within each group) and equal error rates for a single score output (Reich et al., 2020). Newer methods circumvent this by splitting the tasks—enforcing calibration on scores, and error-rate equality on decisions—using linear programs to optimize for accuracy subject to group-level constraints.
Empirically, such procedures have been shown to remove error disparities while maintaining calibration in high-stakes applications (e.g., recidivism prediction and credit lending), outperforming baseline methods that only omit sensitive features or enforce opportunity-based fairness without calibration (Reich et al., 2020).
7. Limitations and Considerations
EER is restricted to binary evaluation and is sensitive to the score distributions’ overlap. It does not account for differing costs associated with errors unless used as part of a broader Bayes decision-theoretic framework. It also disregards score calibration, making it unsuitable where the scores’ probabilistic interpretation is operationally critical. In fairness contexts, EER-only constraints may incur trade-offs with calibration and group-level predictive validity. In practice, EER should be complemented with metrics that directly consider costs, priors, calibration, and group-level equity, especially in regulated or high-stakes systems (Ferrer, 2022, Brümmer et al., 2021, Reich et al., 2020).