Kling-Gupta Efficiency (KGE)
- KGE is a scalar hydrological metric that combines Pearson correlation, variability, and bias ratios into one score to assess model-observation agreement.
- It enables clear diagnostic analysis by isolating timing, amplitude, and bias errors, guiding targeted improvements in predictive models.
- KGE is applied in cross-validation and benchmarking studies, offering enhanced interpretability over traditional metrics like Nash–Sutcliffe Efficiency.
Kling-Gupta Efficiency (KGE) is a scalar performance metric for evaluating the agreement between observed and simulated time-series in hydrological modeling, particularly emphasizing a balanced reflection of correlation, bias, and variability. Developed to address interpretability challenges in classical metrics such as Nash-Sutcliffe Efficiency (@@@@1@@@@), KGE synthesizes three components—linear correlation, variability ratio, and bias ratio—into a single, physically meaningful score that enables nuanced diagnostic and comparative analysis of predictive models across diverse regions and data scenarios (Shi, 2024, Feng et al., 2020, Wang et al., 1 Feb 2026).
1. Formal Definition and Mathematical Formulation
Kling-Gupta Efficiency in its standard form is defined as:
with the terms specified as:
- : Pearson correlation coefficient between simulated and observed time series (e.g., discharge, precipitation, evapotranspiration).
- (variability ratio): typically , where denotes standard deviation.
- (bias ratio): usually , with denoting mean value.
Alternative notations include for the variability ratio (e.g., in (Wang et al., 1 Feb 2026)). All terms default to unity in the case of perfect model-observation match, such that indicates perfect agreement, while deviations in any component decrease the score. The metric is unweighted (all components contribute equally), though the literature notes variant definitions exist with component renaming or order variation.
A variant expression, as found in (Shi, 2024), may present and as explicit ratios or rearrange the order of subtraction in the denominator; however, the implementation relies on the principal definition above.
2. Component Interpretation and Diagnostic Value
KGE's construction permits interpretation and troubleshooting of prediction errors along three axes:
- Correlation term (): Quantifies linear association between simulated and observed series. indicates perfect correlation, signals increasing scatter or phase mismatches.
- Variability ratio (, ): Measures the model's reproduction of observed variability. Values above unity indicate overdispersion (“noisy” simulation), values below unity indicate insufficient variability (“overly smooth” outputs).
- Bias ratio (): Reflects systematic shift in mean. corresponds to positive bias (overprediction), indicates negative bias (underprediction).
By comparing each component to the ideal value (1), KGE enables practitioners to localize deficiencies: poor correlation often reflects timing errors or omitted dynamics, poor variability ratio points to amplitude misrepresentation, and bias ratio indicates systematic additivity or offset issues (Shi, 2024, Wang et al., 1 Feb 2026).
3. Methodologies for KGE Computation in Hydrological Evaluation
Typical workflows for applying KGE encompass the following procedure, as exemplified in recent studies:
- Data Aggregation: Compile observed and simulated time series (e.g., daily ET, discharge, precipitation) across sites, basins, or stations. Resampling (e.g., daily mean or sum) is often used for consistency (Feng et al., 2020, Wang et al., 1 Feb 2026).
- Cross-Validation: Employ site-based or region-based holdout tests, such as leave-one-out (LOO) or holdout-by-region, to evaluate generalizability and extrapolation under ungauged or unseen conditions (Shi, 2024, Feng et al., 2020).
- Component Calculation: At each evaluation unit (e.g., site, basin, gauge), compute , (or ), and over the test period.
- KGE Synthesis: Combine terms using the canonical formula for each evaluation unit.
- Summary Statistic: Aggregate results (median, boxplot distribution) across the full population of evaluation units for comparative assessment (Feng et al., 2020, Wang et al., 1 Feb 2026).
4. Empirical Results and Comparative Performance Assessment
KGE is extensively used to benchmark hydrological and precipitation models against observational data at scale:
| Study/Model | Evaluation Set | Median KGE | Key Findings |
|---|---|---|---|
| (Shi, 2024) DANN | Global ET (LOO CV, 129 sites) | Up to >0.8 | DANN increases KGE by 0.2–0.3 over LOO-RF; >0.2 gain for forests |
| (Feng et al., 2020) FDC-LSTM | US basins (PUR, 7 regions) | 0.556–0.619 | Sparse FDC boosts KGE by ~0.05, ensemble eliminates KGE<0 cases |
| (Wang et al., 1 Feb 2026) MSWEP V3 | Global precipitation, 15,958 gauges | 0.69 | Outperforms ERA5 (0.61), IMERG-L V7 (0.46), GSMaP (0.38), CHIRP (0.31) |
Reported improvements in KGE reflect not just accuracy gains but enhanced robustness, particularly the reduction or elimination of catastrophic prediction failures (sites/basins with KGE )—a notable advantage in ungauged or extrapolative scenarios (Shi, 2024, Feng et al., 2020).
5. Comparison to Alternative Metrics and Benefits
Nash–Sutcliffe Efficiency (NSE), the classical choice for model evaluation, is defined as , which aggregates total squared error without distinguishing underlying error components. This blending means a large error in bias, variance, or correlation can dominate the score, confounding physical interpretation. KGE supersedes the NSE by isolating and penalizing mismatches across correlation, bias, and variability explicitly, allowing direct diagnosis and interpretation of which deficiency drives model-substrate mismatch (Wang et al., 1 Feb 2026). This balanced approach has contributed to KGE’s widespread adoption in hydrological evaluation and multi-model benchmarking (Feng et al., 2020).
6. Best Practices and Implementation Guidance
Effective usage of KGE requires adherence to the following recommendations:
- Component Disclosure: Always report individual values of , (or ), and alongside KGE to clarify error source(s) (Shi, 2024, Wang et al., 1 Feb 2026).
- Cross-Validation Protocol: For extrapolation or ungauged testing, use robust CV schemes (LOO, region holdout) and analyze KGE distributions per evaluation unit (Shi, 2024, Feng et al., 2020).
- Definition Consistency: Specify the precise definition (including the order of numerator/denominator in ratios) and the use of the square-root, as minor implementation differences can yield discrepancies (Shi, 2024).
- Model Selection: KGE serves as both an overall metric and a loss/validation criterion in hyperparameter tuning, architecture search, or ensemble construction (Shi, 2024, Feng et al., 2020).
- Diagnostic Use: Decompose low KGE to inform targeted model improvements—enhancing data-driven feature extraction (to improve ), regularizing output variance (to tune /), or correcting systematic bias (to refine ) (Shi, 2024, Feng et al., 2020).
7. Application Contexts and Recent Innovations
KGE is routinely used in current hydrology and geoscience machine learning benchmark suites. For instance, domain-adversarial neural networks (DANN) improve KGE significantly, demonstrating the utility of domain adaptation in enhancing model extrapolability, especially for ungauged basins or regions with unique biogeographic characteristics (Shi, 2024). Ensembling across input-combinations and assimilating auxiliary information (e.g., flow duration curves) mitigate catastrophic prediction failures, raising both median and lower-quantile KGE values (Feng et al., 2020). In global-scale precipitation analysis, KGE enables rigorous cross-product benchmarks and guides gauge-correction strategies (Wang et al., 1 Feb 2026).
A plausible implication is that future metric refinement will continue emphasizing explainable, multidimensional diagnostic measures. KGE’s decomposition facilitates physical insight, while its scalar nature permits succinct reporting.
References: Shi et al. (2024) (Shi, 2024); Feng et al. (2021) (Feng et al., 2020); Beck et al. (2026) (Wang et al., 1 Feb 2026).