Divergences induced by dual subtractive and divisive normalizations of exponential families and their convex deformations (2312.12849v2)

Published 20 Dec 2023 in cs.IT, cs.LG, and math.IT

Abstract: Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning among others. An exponential family can either be normalized subtractively by its cumulant or free energy function or equivalently normalized divisively by its partition function. Both subtractive and divisive normalizers are strictly convex and smooth functions inducing pairs of Bregman and Jensen divergences. It is well-known that skewed Bhattacharryya distances between probability densities of an exponential family amounts to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and in limit cases that the sided Kullback-Leibler divergences amount to reverse-sided Bregman divergences. In this paper, we first show that the $\alpha$-divergences between unnormalized densities of an exponential family amounts to scaled $\alpha$-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetic means allows to deform both convex functions and their arguments, and thereby define dually flat spaces with corresponding divergences when ordinary convexity is preserved.

References (33)

Authors (1)

Frank Nielsen (125 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper shows that α-divergences between unnormalized densities equate to scaled α-skewed Jensen divergences using the partition function.
It introduces a comparative convexity method to deform convex functions, preserving dually flat spaces and establishing new divergence forms.
The study bridges Kullback-Leibler and reverse Bregman divergences, offering practical insights for advanced statistical inference and machine learning.

An Analytical Overview of Divergences in Exponential Families

This paper investigates the properties of divergences in exponential families, focusing on dual subtractive and divisive normalizations formed using the cumulant and partition functions, respectively. The work is an insightful exploration into the relationships between various statistical divergence measures, both in the contexts of normalized and unnormalized probability densities, within the framework of information geometry.

Key Concepts and Tools

Exponential Families: The paper explores the concept of exponential families, which are widely used in areas like statistics, machine learning, and information theory. These families can be represented with their natural parameters and sufficient statistics, and can be normalized through either the cumulant (subtractive normalization) or the partition function (divisive normalization).
Normalization Functions: The cumulant function, also referred to as the free energy function, and the partition function are central to this discussion. Both functions are noted to be strictly convex, and they induce Bregman and Jensen divergences which are significant for defining dually flat spaces in information geometry.
Divergences: The paper explores several divergences, including Bregman divergence, Jensen divergence, Bhattacharyya distance, and $\alpha$ -divergences. A particular focus is placed on the relations between these divergences induced by the cumulant and partition functions of exponential families.

Major Contributions and Findings

Scaled $\alpha$ -Divergences: It is shown that the $\alpha$ -divergences between unnormalized densities of an exponential family are equivalent to scaled $\alpha$ -skewed Jensen divergences, utilizing the partition function. This result builds on the idea that traditional statistical divergence measures can be related to divergences in the parameter space.
Comparative Convexity: By employing comparative convexity, the paper introduces a method to deform convex functions and their arguments using quasi-arithmetic means. The authors demonstrate how this deformation leads to the preservation of convexity, thereby defining new dually flat spaces with corresponding divergences.
Practical Insights: One of the applications discussed is the link between Kullback-Leibler divergences of probability densities in exponential families and corresponding reverse Bregman divergences. This result provides a bridge between statistical divergences and parameter divergences, which can be particularly useful in machine learning and statistical inference.

Implications and Future Directions

From a theoretical perspective, this work enriches the understanding of divergence measures by connecting different types of divergences via convex functions. Practically, these insights could influence the development of new algorithms in machine learning, particularly in areas like probabilistic models and information geometry.

Future developments could involve extending these concepts to other types of divergences or exploring applications in different domains of machine learning and statistics. Additionally, the deformation techniques introduced could be further refined and applied to more complex models beyond traditional exponential families.

In summary, this paper offers a thorough and rigorously framed analysis of divergences within exponential families, presenting connections that hold significant theoretical interest and potential practical applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/FrnkNlsn/status/1804319570439864442

https://twitter.com/FrnkNlsn/status/1874461455858507819

https://twitter.com/FrnkNlsn/status/1881775519915536857

https://twitter.com/FrnkNlsn/status/1890219342123659472

https://twitter.com/117258094/status/1740200121206534303

https://twitter.com/117258094/status/1738337506926473432