A Widely Applicable Bayesian Information Criterion (1208.6338v1)

Published 31 Aug 2012 in cs.LG and stat.ML

Abstract: A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature $1/\log n$, where $n$ is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for and unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models.

Citations (733)

View on Semantic Scholar

Summary

The paper introduces WBIC as a generalized Bayesian Information Criterion that overcomes BIC's limitations in singular statistical models.
It rigorously establishes theoretical foundations, including the use of the Real Log Canonical Threshold (RLCT) and a unique inverse temperature (1/log n) to align with Bayes free energy.
Numerical experiments validate WBIC’s effectiveness, making it a practical tool for model selection in complex applications like neural networks and mixture models.

A Widely Applicable Bayesian Information Criterion

This paper by Sumio Watanabe presents a significant contribution to the evaluation of statistical models through the introduction of a generalized Bayesian Information Criterion, referred to as the Widely Applicable Bayesian Information Criterion (WBIC). This criterion extends the traditional Bayesian Information Criterion (BIC) to singular statistical models, which are not adequately addressed by BIC due to their inherent complexities.

Overview and Motivation

The paper begins by categorizing statistical models into regular and singular models. Regular models map parameters to probability distributions in a one-to-one manner with a positive definite Fisher Information matrix, allowing the use of approximations like BIC. However, many practical models, including artificial neural networks, normal mixtures, and hidden Markov models, are singular. Singular models are characterized by hierarchical layers, hidden variables, or grammatical rules, making them intrinsically more complex and challenging to evaluate.

Traditional model evaluation methods like AIC, BIC, and MDL fall short when dealing with singular models. This limitation led Watanabe to develop WBIC, a generalized criterion that can be applied to singular models, providing a more accurate asymptotic evaluation of the Bayes free energy.

Main Contributions

Watanabe's WBIC is defined using the average log likelihood function over the posterior distribution with an inverse temperature $1/\log n$ , where $n$ is the number of training samples. The author mathematically proves that WBIC has the same asymptotic expansion as the Bayes free energy, making it a powerful tool for evaluating singular statistical models.

Key contributions include:

Theoretical Foundations: The paper establishes the theoretical underpinnings of WBIC through rigorous mathematical proofs. Watanabe introduces several theorems and lemmas that form the backbone of WBIC's validity. Notably, the introduction of the Real Log Canonical Threshold (RLCT) as a birational invariant to quantify the asymptotic behavior of singular models is central to this paper.
Optimal Inverse Temperature: One critical insight is the existence of a unique inverse temperature $\beta^* = 1/\log n$ that aligns with the Bayes free energy. This result is pivotal for applying WBIC practically without needing prior knowledge about the true distribution, which is often unknown.
Practical Application: The paper demonstrates the practical utility of WBIC in model evaluation, showing its effectiveness in scenarios where traditional criteria are inadequate. The empirical validation through experiments ensures that WBIC is not only a theoretical construct but also a practical tool for researchers and practitioners.

Numerical Results

Watanabe provides strong numerical results to support the theoretical claims. For example, in reduced rank regression models, where traditional BIC fails due to singularities in the parameter space, WBIC proves to be a reliable model selection criterion. Additionally, the ability of WBIC to estimate the real log canonical thresholds (RLCTs) even when the true distribution is unknown demonstrates its robustness and practicality.

Implications and Future Directions

The implications of this research are far-reaching, both in theory and practice. For theoretical statisticians and computer scientists, WBIC provides a robust tool for the asymptotic evaluation of complex models, filling a significant gap left by traditional criteria. Practically, WBIC can be employed to improve model selection in various domains, including machine learning, where models are often inherently singular.

Speculatively, future developments may focus on extending WBIC to broader classes of models and refining the computational methods for its application. Additionally, exploring the relationships between WBIC and other evaluation methods like WAIC could provide deeper insights into model performance and generalization error.

Conclusion

Watanabe's WBIC represents a substantial advancement in the field of statistical model evaluation, particularly for singular models. By providing a generalized criterion that aligns asymptotically with the Bayes free energy, WBIC offers a practical and theoretically sound method for model selection and evaluation. As research and applications of complex models continue to grow, tools like WBIC will become increasingly valuable in ensuring accurate and reliable statistical analysis.

Overall, the paper's contributions significantly enhance the toolkit available to researchers working with complex models, inviting further exploration and application in various fields of statistical learning and artificial intelligence.

PDF Markdown