Divergences induced by dual subtractive and divisive normalizations of exponential families and their convex deformations (2312.12849v2)
Abstract: Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning among others. An exponential family can either be normalized subtractively by its cumulant or free energy function or equivalently normalized divisively by its partition function. Both subtractive and divisive normalizers are strictly convex and smooth functions inducing pairs of Bregman and Jensen divergences. It is well-known that skewed Bhattacharryya distances between probability densities of an exponential family amounts to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and in limit cases that the sided Kullback-Leibler divergences amount to reverse-sided Bregman divergences. In this paper, we first show that the $\alpha$-divergences between unnormalized densities of an exponential family amounts to scaled $\alpha$-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetic means allows to deform both convex functions and their arguments, and thereby define dually flat spaces with corresponding divergences when ordinary convexity is preserved.
- John Aczél. A generalization of the notion of convex functions. Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947.
- Shun-ichi Amari. Differential-geometrical methods in statistics, volume 28. Springer Science & Business Media, 2012. First edition, 1985.
- Shun-ichi Amari. Information Geometry and Its Applications. Applied Mathematical Sciences. Springer Japan, 2016.
- Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine learning, 43:211–246, 2001.
- Clustering with Bregman divergences. Journal of machine learning research, 6(10), 2005.
- Lev M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
- Lawrence D Brown. Fundamentals of statistical exponential families with applications in statistical decision theory. Lecture Notes-Monograph Series, 9, 1986.
- Jacob Burbea and C Rao. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 28(3):489–495, 1982.
- Joan Del Castillo. The singly truncated normal distribution: a non-steep exponential family. Annals of the Institute of Statistical Mathematics, 46:57–66, 1994.
- Shinto Eguchi. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima mathematical journal, 15(2):341–391, 1985.
- Convex foundations for generalized MaxEnt models. In AIP Conference Proceedings, volume 1636, pages 11–16. 2014.
- Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis, 99(9):2053–2081, 2008.
- Peter D Grünwald. The minimum description length principle. MIT press, 2007.
- Philip Hougaard. Convex functions in exponential families. Inst. of Math. Statistics, University of Copenhagen, 1983.
- Edwin T Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.
- Thomas Kailath. The divergence and Bhattacharyya distance measures in signal selection. IEEE transactions on communication technology, 15(1):52–60, 1967.
- Convex functions and their applications, volume 23. Springer, 2006.
- Frank Nielsen. An elementary introduction to information geometry. Entropy, 22(10):1100, 2020.
- Frank Nielsen. Revisiting Chernoff information with likelihood ratio exponential families. Entropy, 24(10):1400, 2022.
- Frank Nielsen. Statistical divergences between densities of truncated exponential families with nested supports: Duo Bregman and duo Jensen divergences. Entropy, 24(3):421, 2022.
- The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
- Statistical exponential families: A digest with flash cards. arXiv preprint arXiv:0911.4863, 2009.
- Monte Carlo information-geometric structures. Geometric Structures of Information, pages 69–103, 2019.
- Entropies and cross-entropies of exponential families. In 2010 IEEE International Conference on Image Processing, pages 3621–3624. IEEE, 2010.
- Generalizing skew Jensen divergences and Bregman divergences with comparative convexity. IEEE Signal Processing Letters, 24(8):1123–1127, 2017.
- On Hölder projective divergences. Entropy, 19(3):122, 2017.
- Learning from examples with information theoretic criteria. Journal of VLSI signal processing systems for signal, image and video technology, 26:61–77, 2000.
- Ralph Tyrrell Rockafellar. Conjugates and Legendre transforms of convex functions. Canadian Journal of Mathematics, 19:200–205, 1967.
- Legendre structure of κ𝜅\kappaitalic_κ-thermostatistics revisited in the framework of information geometry. Journal of Physics A: Mathematical and Theoretical, 47(27):275002, 2014.
- Hirohiko Shima. The geometry of Hessian structures. World Scientific, 2007.
- Dual differential geometry associated with the Kullbaek-Leibler information on the Gaussian distributions and its 2222-parameter deformations. SUT Journal of Mathematics, 35(1):113–137, 1999.
- Jun Zhang. Divergence function, duality, and convex analysis. Neural computation, 16(1):159–195, 2004.
- λ𝜆\lambdaitalic_λ-deformed probability families with subtractive and divisive normalizations. In Handbook of Statistics, volume 45, pages 187–215. Elsevier, 2021.
- Frank Nielsen (125 papers)