Widely Applicable Information Criterion (WAIC)

Updated 6 February 2026

WAIC is a fully Bayesian criterion that estimates out-of-sample predictive performance by balancing model fit and complexity via a data-adaptive penalty based on posterior variance.
WAIC quantifies effective model complexity by summing the posterior variances of log-likelihood contributions, making it robust for singular, hierarchical, and nonparametric models.
WAIC is computationally efficient as it requires only one full Bayesian fit and is asymptotically equivalent to Bayesian leave-one-out cross-validation under mild regularity conditions.

The Widely Applicable Information Criterion (WAIC) is a fully Bayesian information criterion that provides a general, theoretically justified approach for estimating out-of-sample predictive accuracy in parametric and nonparametric models, including those classified as singular. WAIC was introduced by Watanabe to extend the reliability of information-theoretic model selection to modern Bayesian modeling scenarios where classical criteria such as AIC, BIC, or DIC falter due to the breakdown of regular asymptotic theory, model singularity, or strong prior influence. WAIC is defined as a penalized estimate of expected log predictive density, incorporating both model fit and a data-adaptive Bayesian penalty for model complexity, and is asymptotically equivalent to Bayesian leave-one-out cross-validation (LOO-CV) under mild regularity, even in singular settings (Watanabe, 2010, Watanabe, 2015, Hayashi et al., 20 May 2025).

1. Formal Definition and Mathematical Properties

WAIC estimates twice the negative expected log pointwise predictive density (lppd), penalized by the sum of posterior variances of the log-likelihood contributions, which represents the effective number of parameters. Formally, for i.i.d. data $y_{1:n}$ modeled by the likelihood $p(y_i \mid \theta)$ and posterior $p(\theta \mid y_{1:n})$ :

$\begin{align*} \mathrm{lppd} & = \sum_{i=1}^n \log \Bigg(\int p(y_i\mid\theta)\;p(\theta \mid y_{1:n})\;d\theta\Bigg), \ p_{\mathrm{WAIC}} & = \sum_{i=1}^n \mathrm{Var}_{p(\theta \mid y_{1:n})} \Bigl[\log p(y_i\mid\theta)\Bigr], \ \mathrm{WAIC} & = -2 \mathrm{lppd} + 2 p_{\mathrm{WAIC}} \ & = -2\sum_{i=1}^n \log \mathbb{E}_{\theta}\big[p(y_i\mid\theta)\big] + 2\sum_{i=1}^n \mathrm{Var}_\theta\big[\log p(y_i\mid\theta)\big]. \end{align*}$

Practically, with $S$ posterior samples $\{\theta^{(s)}\}$ , the lppd and penalty are estimated as:

$\widehat{\mathrm{lppd}} = \sum_{i=1}^n \log\left( \frac{1}{S} \sum_{s=1}^S p(y_i\mid\theta^{(s)}) \right), \quad \widehat{p}_{\mathrm{WAIC}} = \sum_{i=1}^n \frac{1}{S-1} \sum_{s=1}^S \big( \log p(y_i\mid\theta^{(s)}) - \overline{\ell}_i \big)^2$

where $\overline{\ell}_i = \frac{1}{S} \sum_{s=1}^S \log p(y_i\mid\theta^{(s)})$ .

WAIC selectively penalizes overfitting via $p_{\mathrm{WAIC}}$ , which estimates the "effective number of parameters" adaptively in any Bayesian model (Yong, 2018, Vehtari et al., 2015, Jung et al., 2024).

2. Theoretical Foundations and Asymptotic Equivalence to LOO-CV

WAIC is derived to approximate the Bayesian expected log predictive density for out-of-sample data, correcting for the bias induced by evaluating the fitted model on the same data used for estimation. The core theoretical result is that, under minimal regularity assumptions (including singular/non-identifiable models), WAIC is asymptotically equivalent to Bayesian leave-one-out cross-validation both in expectation and as a random variable, up to $O(n^{-2})$ (Watanabe, 2010, Watanabe, 2015, Hayashi et al., 20 May 2025). Formally, for large $n$ , the difference between WAIC and CV falls below $O_p(n^{-2})$ and both offer consistent minimization of the average generalization loss.

In singular models, the asymptotic generalization error decays as $\lambda / n$ , where $\lambda$ is the real log canonical threshold (RLCT) of the model likelihood geometry. Classical criteria tied to the parameter dimension (AIC: $d/n$ ; BIC: $(d/2) \log n / n$ ) fail in such cases, whereas WAIC’s penalty correctly tracks the learning-theoretic complexity (Hayashi et al., 20 May 2025, Watanabe, 2010).

3. Practical Computation in Bayesian Workflows

WAIC requires only a single full-Bayesian fit—no repetitive refitting or direct marginal likelihood evaluation. The computational recipe is:

Run MCMC or another posterior sampler to obtain $S$ draws $\{\theta^{(s)}\}$ .
For each data point $i$ , calculate $\ell_i^{(s)} = \log p(y_i \mid \theta^{(s)})$ .
Compute

$\begin{split} \widehat{\mathrm{lppd}}_i &= \log\left(\frac{1}{S} \sum_{s=1}^S e^{\ell_i^{(s)}}\right), \ \widehat{v}_i &= \frac{1}{S-1} \sum_{s=1}^S \big(\ell_i^{(s)} - \overline{\ell}_i \big)^2, \end{split}$

then sum over $i$ to obtain global fit and penalty.

Assemble

$\mathrm{WAIC} = -2 \sum_{i=1}^n \widehat{\mathrm{lppd}}_i + 2 \sum_{i=1}^n \widehat{v}_i,$

or equivalently, compute the estimated expected log pointwise predictive density, $\widehat{\mathrm{elpd}}_{\mathrm{WAIC}} = \sum_{i=1}^n (\widehat{\mathrm{lppd}}_i - \widehat{v}_i)$ , and set $\mathrm{WAIC} = -2 \widehat{\mathrm{elpd}}_{\mathrm{WAIC}}$ (Vehtari et al., 2015, Watanabe, 2015).

For time series, Markov, or other dependent data, the recently developed covariance-corrected WAIC (CC-WAIC) incorporates posterior covariance of log-likelihoods to provide an improved penalty, especially in non-i.i.d. contexts (Kadhem, 22 Sep 2025).

4. Comparative Assessment: WAIC, LOO, DIC, and Contemporary Variants

WAIC is fully Bayesian—integrating over the posterior—whereas DIC evaluates fit at a single posterior mean or mode. Empirical studies and simulation benchmarks in item response models, diagnostic classification, hierarchical regression, and other contexts show:

WAIC vs LOO (PSIS-LOO): Empirically similar, but PSIS-LOO can be more robust in cases of weak priors or influential data points. WAIC may under- or over-correct bias for highly skewed posteriors or with large log-likelihood variance, but offers substantial computational speedups (Jung et al., 2024, Vehtari et al., 2015, Yong, 2018).
WAIC vs DIC: DIC may inadequately penalize complex or hierarchical models, especially in singular cases. WAIC, by using posterior variance, more adaptively captures local overfitting and remains reliable in settings where DIC's point-estimate-based complexity measure fails (Vehtari et al., 2015, Jung et al., 2024, Yong, 2018).
Bias-corrected and generalized forms: Recent work has shown how to correct WAIC's $O(1/n)$ bias by incorporating a posterior covariance penalty or higher-order cumulants (see PCIC and CC-WAIC) (Iba et al., 2022, Iba et al., 2021, Kadhem, 22 Sep 2025). In weighted or covariate-shifted inference, PCIC replaces the variance penalty by a posterior covariance with the training score function, achieving asymptotic unbiasedness for the generalization error.

5. Applications and Empirical Performance

Extensive simulation studies and real-data applications have validated the utility and robustness of WAIC in various domains:

Polytomous item response modeling: WAIC and LOO typically have statistical power >0.93 to detect the true generating model, although WAIC tends to over-penalize in smaller samples or with more complex models and exhibits slightly reduced power compared to classical frequentist criteria and DIC (Yong, 2018).
Bayesian multilevel group lasso: WAIC-guided regularization hyperparameter tuning avoids pathological overshrinkage encountered with empirical/fully Bayesian marginal likelihood approaches in high-dimensional, weak-signal regimes (Nathoo et al., 2016).
Diagnostic classification models (Bayesian DCMs): WAIC and PSIS-LOO preferentially select the correct model in most simulated settings. DIC may have advantage for highly parameterized models under small sample size, but WAIC/LOO prevent overfitting better, especially with weak priors (Jung et al., 2024).
Out-of-distribution detection: WAIC provides a practical, unsupervised uncertainty quantification—via ensemble variance of the log-likelihood—distinguishing in- vs out-of-distribution spectra in intra-operative functional imaging, with tractable computation via invertible neural network ensembles (Adler et al., 2019).
Bayesian external data borrowing: In clinical trial mixture-prior borrowing, the WAIC-based "WOW" gate delivers a fully Bayesian, cross-validation-like diagnostic that blocks borrowing when external and current data are discordant, reducing bias and overfitting (Zhou et al., 6 Oct 2025).

6. Extensions, Limitations, and Current Research Directions

WAIC assumes conditional independence unless explicitly corrected (as in CC-WAIC). In dependent-data regimes (HMMs, time-series), the classical penalty underestimates model complexity; covariance-corrected WAIC (CC-WAIC) incorporates off-diagonal covariance terms, yielding more accurate model complexity estimates and improved model selection, particularly in small-sample or highly dependent scenarios (Kadhem, 22 Sep 2025).

WAIC’s finite-sample bias, particularly in the presence of strong priors or heavy-tailed log likelihoods, is well-documented. Alternatives such as posterior covariance information criterion (PCIC) generalize the penalty structure to arbitrary loss functions and covariate-shifted settings (Iba et al., 2021, Iba et al., 2022).

Research continues to extend WAIC's applicability:

Singular model theory: New results link WAIC and WBIC via a formal asymptotic equation, establishing that, for computational efficiency, WAIC may be reliably estimated at a single tempered posterior, reducing computational overhead in model selection for singular models (Hayashi et al., 20 May 2025).
Bias-corrected variants: Empirical and theoretical studies show that variance/covariance-penalty corrections remove leading-order bias, especially important with strong priors or in high-dimensional models (Imai, 2019, Iba et al., 2022).
Generalized Bayesian inference: PCIC and other WAIC generalizations allow for consistent predictive evaluation beyond the standard log-likelihood setting, including differential privacy-preserving inference and counterfactual prediction (Iba et al., 2021, Iba et al., 2022).

7. Summary Table: Key Properties and Comparative Performance

Criterion	Posterior Usage	Model Class	Penalty Term	Asymptotic Target	Empirical Notes
WAIC	Full posterior	Regular/Singular	∑ posterior var(log-lik)	Bayes LOO-CV/generalization	Efficient, robust
DIC	Point/posterior mean	Regular	2[mean deviance – dev(mean)]	Often deviates when singular	Simpler, less robust
PSIS-LOO	Full (via importance sampling)	Regular/Singular	Leave-one-out deviance	Bayes LOO-CV/gener.	More stable in weak prior regimes
CC-WAIC	Full (with covariances)	Dependent-data (HMM, timeseries)	Total var/cov of log-lik	LOO/generalization	Superior for correlated data
PCIC	Full (covariance)	Log/non-log loss, weighted	Covariance penalty	Out-of-sample risk (arbitrary loss)	Unbiased, general loss

WAIC combines computational feasibility with strong theoretical guarantees under broad modeling conditions. It remains the default Bayesian information criterion for model selection, hyperparameter tuning, predictive prior design, and uncertainty quantification in the Bayesian statistics and machine learning literature (Watanabe, 2010, Watanabe, 2015, Vehtari et al., 2015, Hayashi et al., 20 May 2025, Kadhem, 22 Sep 2025). Ongoing research continues to refine its bias properties, extend its applicability to new problem settings, and improve estimation under challenging dependence structures and in the presence of model misspecification.