Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Divergences induced by dual subtractive and divisive normalizations of exponential families and their convex deformations (2312.12849v2)

Published 20 Dec 2023 in cs.IT, cs.LG, and math.IT

Abstract: Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning among others. An exponential family can either be normalized subtractively by its cumulant or free energy function or equivalently normalized divisively by its partition function. Both subtractive and divisive normalizers are strictly convex and smooth functions inducing pairs of Bregman and Jensen divergences. It is well-known that skewed Bhattacharryya distances between probability densities of an exponential family amounts to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and in limit cases that the sided Kullback-Leibler divergences amount to reverse-sided Bregman divergences. In this paper, we first show that the $\alpha$-divergences between unnormalized densities of an exponential family amounts to scaled $\alpha$-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetic means allows to deform both convex functions and their arguments, and thereby define dually flat spaces with corresponding divergences when ordinary convexity is preserved.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. John Aczél. A generalization of the notion of convex functions. Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947.
  2. Shun-ichi Amari. Differential-geometrical methods in statistics, volume 28. Springer Science & Business Media, 2012. First edition, 1985.
  3. Shun-ichi Amari. Information Geometry and Its Applications. Applied Mathematical Sciences. Springer Japan, 2016.
  4. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine learning, 43:211–246, 2001.
  5. Clustering with Bregman divergences. Journal of machine learning research, 6(10), 2005.
  6. Lev M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
  7. Lawrence D Brown. Fundamentals of statistical exponential families with applications in statistical decision theory. Lecture Notes-Monograph Series, 9, 1986.
  8. Jacob Burbea and C Rao. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 28(3):489–495, 1982.
  9. Joan Del Castillo. The singly truncated normal distribution: a non-steep exponential family. Annals of the Institute of Statistical Mathematics, 46:57–66, 1994.
  10. Shinto Eguchi. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima mathematical journal, 15(2):341–391, 1985.
  11. Convex foundations for generalized MaxEnt models. In AIP Conference Proceedings, volume 1636, pages 11–16. 2014.
  12. Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99(9):2053–2081, 2008.
  13. Peter D Grünwald. The minimum description length principle. MIT press, 2007.
  14. Philip Hougaard. Convex functions in exponential families. Inst. of Math. Statistics, University of Copenhagen, 1983.
  15. Edwin T Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.
  16. Thomas Kailath. The divergence and Bhattacharyya distance measures in signal selection. IEEE transactions on communication technology, 15(1):52–60, 1967.
  17. Convex functions and their applications, volume 23. Springer, 2006.
  18. Frank Nielsen. An elementary introduction to information geometry. Entropy, 22(10):1100, 2020.
  19. Frank Nielsen. Revisiting Chernoff information with likelihood ratio exponential families. Entropy, 24(10):1400, 2022.
  20. Frank Nielsen. Statistical divergences between densities of truncated exponential families with nested supports: Duo Bregman and duo Jensen divergences. Entropy, 24(3):421, 2022.
  21. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
  22. Statistical exponential families: A digest with flash cards. arXiv preprint arXiv:0911.4863, 2009.
  23. Monte Carlo information-geometric structures. Geometric Structures of Information, pages 69–103, 2019.
  24. Entropies and cross-entropies of exponential families. In 2010 IEEE International Conference on Image Processing, pages 3621–3624. IEEE, 2010.
  25. Generalizing skew Jensen divergences and Bregman divergences with comparative convexity. IEEE Signal Processing Letters, 24(8):1123–1127, 2017.
  26. On Hölder projective divergences. Entropy, 19(3):122, 2017.
  27. Learning from examples with information theoretic criteria. Journal of VLSI signal processing systems for signal, image and video technology, 26:61–77, 2000.
  28. Ralph Tyrrell Rockafellar. Conjugates and Legendre transforms of convex functions. Canadian Journal of Mathematics, 19:200–205, 1967.
  29. Legendre structure of κ𝜅\kappaitalic_κ-thermostatistics revisited in the framework of information geometry. Journal of Physics A: Mathematical and Theoretical, 47(27):275002, 2014.
  30. Hirohiko Shima. The geometry of Hessian structures. World Scientific, 2007.
  31. Dual differential geometry associated with the Kullbaek-Leibler information on the Gaussian distributions and its 2222-parameter deformations. SUT Journal of Mathematics, 35(1):113–137, 1999.
  32. Jun Zhang. Divergence function, duality, and convex analysis. Neural computation, 16(1):159–195, 2004.
  33. λ𝜆\lambdaitalic_λ-deformed probability families with subtractive and divisive normalizations. In Handbook of Statistics, volume 45, pages 187–215. Elsevier, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Frank Nielsen (125 papers)
Citations (2)

Summary

  • The paper shows that α-divergences between unnormalized densities equate to scaled α-skewed Jensen divergences using the partition function.
  • It introduces a comparative convexity method to deform convex functions, preserving dually flat spaces and establishing new divergence forms.
  • The study bridges Kullback-Leibler and reverse Bregman divergences, offering practical insights for advanced statistical inference and machine learning.

An Analytical Overview of Divergences in Exponential Families

This paper investigates the properties of divergences in exponential families, focusing on dual subtractive and divisive normalizations formed using the cumulant and partition functions, respectively. The work is an insightful exploration into the relationships between various statistical divergence measures, both in the contexts of normalized and unnormalized probability densities, within the framework of information geometry.

Key Concepts and Tools

  1. Exponential Families: The paper explores the concept of exponential families, which are widely used in areas like statistics, machine learning, and information theory. These families can be represented with their natural parameters and sufficient statistics, and can be normalized through either the cumulant (subtractive normalization) or the partition function (divisive normalization).
  2. Normalization Functions: The cumulant function, also referred to as the free energy function, and the partition function are central to this discussion. Both functions are noted to be strictly convex, and they induce Bregman and Jensen divergences which are significant for defining dually flat spaces in information geometry.
  3. Divergences: The paper explores several divergences, including Bregman divergence, Jensen divergence, Bhattacharyya distance, and α\alpha-divergences. A particular focus is placed on the relations between these divergences induced by the cumulant and partition functions of exponential families.

Major Contributions and Findings

  • Scaled α\alpha-Divergences: It is shown that the α\alpha-divergences between unnormalized densities of an exponential family are equivalent to scaled α\alpha-skewed Jensen divergences, utilizing the partition function. This result builds on the idea that traditional statistical divergence measures can be related to divergences in the parameter space.
  • Comparative Convexity: By employing comparative convexity, the paper introduces a method to deform convex functions and their arguments using quasi-arithmetic means. The authors demonstrate how this deformation leads to the preservation of convexity, thereby defining new dually flat spaces with corresponding divergences.
  • Practical Insights: One of the applications discussed is the link between Kullback-Leibler divergences of probability densities in exponential families and corresponding reverse Bregman divergences. This result provides a bridge between statistical divergences and parameter divergences, which can be particularly useful in machine learning and statistical inference.

Implications and Future Directions

From a theoretical perspective, this work enriches the understanding of divergence measures by connecting different types of divergences via convex functions. Practically, these insights could influence the development of new algorithms in machine learning, particularly in areas like probabilistic models and information geometry.

Future developments could involve extending these concepts to other types of divergences or exploring applications in different domains of machine learning and statistics. Additionally, the deformation techniques introduced could be further refined and applied to more complex models beyond traditional exponential families.

In summary, this paper offers a thorough and rigorously framed analysis of divergences within exponential families, presenting connections that hold significant theoretical interest and potential practical applications.