Kling-Gupta Efficiency (KGE)

Updated 8 February 2026

KGE is a scalar hydrological metric that combines Pearson correlation, variability, and bias ratios into one score to assess model-observation agreement.
It enables clear diagnostic analysis by isolating timing, amplitude, and bias errors, guiding targeted improvements in predictive models.
KGE is applied in cross-validation and benchmarking studies, offering enhanced interpretability over traditional metrics like Nash–Sutcliffe Efficiency.

Kling-Gupta Efficiency (KGE) is a scalar performance metric for evaluating the agreement between observed and simulated time-series in hydrological modeling, particularly emphasizing a balanced reflection of correlation, bias, and variability. Developed to address interpretability challenges in classical metrics such as Nash-Sutcliffe Efficiency (NSE), KGE synthesizes three components—linear correlation, variability ratio, and bias ratio—into a single, physically meaningful score that enables nuanced diagnostic and comparative analysis of predictive models across diverse regions and data scenarios (Shi, 2024, Feng et al., 2020, Wang et al., 1 Feb 2026).

1. Formal Definition and Mathematical Formulation

Kling-Gupta Efficiency in its standard form is defined as:

$\mathrm{KGE} = 1 - \sqrt{(r-1)^2 + (\alpha-1)^2 + (\beta-1)^2}$

with the terms specified as:

$r$ : Pearson correlation coefficient between simulated and observed time series (e.g., discharge, precipitation, evapotranspiration).
$\alpha$ (variability ratio): typically $\sigma_{\mathrm{sim}}/\sigma_{\mathrm{obs}}$ , where $\sigma$ denotes standard deviation.
$\beta$ (bias ratio): usually $\mu_{\mathrm{sim}}/\mu_{\mathrm{obs}}$ , with $\mu$ denoting mean value.

Alternative notations include $\gamma$ for the variability ratio (e.g., in (Wang et al., 1 Feb 2026)). All terms default to unity in the case of perfect model-observation match, such that $\mathrm{KGE}=1$ indicates perfect agreement, while deviations in any component decrease the score. The metric is unweighted (all components contribute equally), though the literature notes variant definitions exist with component renaming or order variation.

A variant expression, as found in (Shi, 2024), may present $r$ 0 and $r$ 1 as explicit ratios or rearrange the order of subtraction in the denominator; however, the implementation relies on the principal definition above.

2. Component Interpretation and Diagnostic Value

KGE's construction permits interpretation and troubleshooting of prediction errors along three axes:

Correlation term ( $r$ 2): Quantifies linear association between simulated and observed series. $r$ 3 indicates perfect correlation, $r$ 4 signals increasing scatter or phase mismatches.
Variability ratio ( $r$ 5, $r$ 6): Measures the model's reproduction of observed variability. Values above unity indicate overdispersion (“noisy” simulation), values below unity indicate insufficient variability (“overly smooth” outputs).
Bias ratio ( $r$ 7): Reflects systematic shift in mean. $r$ 8 corresponds to positive bias (overprediction), $r$ 9 indicates negative bias (underprediction).

By comparing each component to the ideal value (1), KGE enables practitioners to localize deficiencies: poor correlation often reflects timing errors or omitted dynamics, poor variability ratio points to amplitude misrepresentation, and bias ratio indicates systematic additivity or offset issues (Shi, 2024, Wang et al., 1 Feb 2026).

3. Methodologies for KGE Computation in Hydrological Evaluation

Typical workflows for applying KGE encompass the following procedure, as exemplified in recent studies:

Data Aggregation: Compile observed and simulated time series (e.g., daily ET, discharge, precipitation) across sites, basins, or stations. Resampling (e.g., daily mean or sum) is often used for consistency (Feng et al., 2020, Wang et al., 1 Feb 2026).
Cross-Validation: Employ site-based or region-based holdout tests, such as leave-one-out (LOO) or holdout-by-region, to evaluate generalizability and extrapolation under ungauged or unseen conditions (Shi, 2024, Feng et al., 2020).
Component Calculation: At each evaluation unit (e.g., site, basin, gauge), compute $\alpha$ 0, $\alpha$ 1 (or $\alpha$ 2), and $\alpha$ 3 over the test period.
KGE Synthesis: Combine terms using the canonical formula for each evaluation unit.
Summary Statistic: Aggregate results (median, boxplot distribution) across the full population of evaluation units for comparative assessment (Feng et al., 2020, Wang et al., 1 Feb 2026).

4. Empirical Results and Comparative Performance Assessment

KGE is extensively used to benchmark hydrological and precipitation models against observational data at scale:

Study/Model	Evaluation Set	Median KGE	Key Findings
(Shi, 2024) DANN	Global ET (LOO CV, 129 sites)	Up to >0.8	DANN increases KGE by 0.2–0.3 over LOO-RF; >0.2 gain for forests
(Feng et al., 2020) FDC-LSTM	US basins (PUR, 7 regions)	0.556–0.619	Sparse FDC boosts KGE by ~0.05, ensemble eliminates KGE<0 cases
(Wang et al., 1 Feb 2026) MSWEP V3	Global precipitation, 15,958 gauges	0.69	Outperforms ERA5 (0.61), IMERG-L V7 (0.46), GSMaP (0.38), CHIRP (0.31)

Reported improvements in KGE reflect not just accuracy gains but enhanced robustness, particularly the reduction or elimination of catastrophic prediction failures (sites/basins with KGE $\alpha$ 4)—a notable advantage in ungauged or extrapolative scenarios (Shi, 2024, Feng et al., 2020).

5. Comparison to Alternative Metrics and Benefits

Nash–Sutcliffe Efficiency (NSE), the classical choice for model evaluation, is defined as $\alpha$ 5, which aggregates total squared error without distinguishing underlying error components. This blending means a large error in bias, variance, or correlation can dominate the score, confounding physical interpretation. KGE supersedes the NSE by isolating and penalizing mismatches across correlation, bias, and variability explicitly, allowing direct diagnosis and interpretation of which deficiency drives model-substrate mismatch (Wang et al., 1 Feb 2026). This balanced approach has contributed to KGE’s widespread adoption in hydrological evaluation and multi-model benchmarking (Feng et al., 2020).

6. Best Practices and Implementation Guidance

Effective usage of KGE requires adherence to the following recommendations:

Component Disclosure: Always report individual values of $\alpha$ 6, $\alpha$ 7 (or $\alpha$ 8), and $\alpha$ 9 alongside KGE to clarify error source(s) (Shi, 2024, Wang et al., 1 Feb 2026).
Cross-Validation Protocol: For extrapolation or ungauged testing, use robust CV schemes (LOO, region holdout) and analyze KGE distributions per evaluation unit (Shi, 2024, Feng et al., 2020).
Definition Consistency: Specify the precise definition (including the order of numerator/denominator in ratios) and the use of the square-root, as minor implementation differences can yield discrepancies (Shi, 2024).
Model Selection: KGE serves as both an overall metric and a loss/validation criterion in hyperparameter tuning, architecture search, or ensemble construction (Shi, 2024, Feng et al., 2020).
Diagnostic Use: Decompose low KGE to inform targeted model improvements—enhancing data-driven feature extraction (to improve $\sigma_{\mathrm{sim}}/\sigma_{\mathrm{obs}}$ 0), regularizing output variance (to tune $\sigma_{\mathrm{sim}}/\sigma_{\mathrm{obs}}$ 1/ $\sigma_{\mathrm{sim}}/\sigma_{\mathrm{obs}}$ 2), or correcting systematic bias (to refine $\sigma_{\mathrm{sim}}/\sigma_{\mathrm{obs}}$ 3) (Shi, 2024, Feng et al., 2020).

7. Application Contexts and Recent Innovations

KGE is routinely used in current hydrology and geoscience machine learning benchmark suites. For instance, domain-adversarial neural networks (DANN) improve KGE significantly, demonstrating the utility of domain adaptation in enhancing model extrapolability, especially for ungauged basins or regions with unique biogeographic characteristics (Shi, 2024). Ensembling across input-combinations and assimilating auxiliary information (e.g., flow duration curves) mitigate catastrophic prediction failures, raising both median and lower-quantile KGE values (Feng et al., 2020). In global-scale precipitation analysis, KGE enables rigorous cross-product benchmarks and guides gauge-correction strategies (Wang et al., 1 Feb 2026).

A plausible implication is that future metric refinement will continue emphasizing explainable, multidimensional diagnostic measures. KGE’s decomposition facilitates physical insight, while its scalar nature permits succinct reporting.

References: Shi et al. (2024) (Shi, 2024); Feng et al. (2021) (Feng et al., 2020); Beck et al. (2026) (Wang et al., 1 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (3)

Extrapolability Improvement of Machine Learning-Based Evapotranspiration Models via Domain-Adversarial Neural Networks (2024)

Prediction in ungauged regions with sparse flow duration curves and input-selection ensemble modeling (2020)

MSWEP V3: Machine Learning-Powered Global Precipitation Estimates at 0.1$^\circ$ Hourly Resolution (1979-Present) (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Kling-Gupta Efficiency (KGE).