A solution for the mean parametrization of the von Mises-Fisher distribution (2404.07358v1)
Abstract: The von Mises-Fisher distribution as an exponential family can be expressed in terms of either its natural or its mean parameters. Unfortunately, however, the normalization function for the distribution in terms of its mean parameters is not available in closed form, limiting the practicality of the mean parametrization and complicating maximum-likelihood estimation more generally. We derive a second-order ordinary differential equation, the solution to which yields the mean-parameter normalizer along with its first two derivatives, as well as the variance function of the family. We also provide closed-form approximations to the solution of the differential equation. This allows rapid evaluation of both densities and natural parameters in terms of mean parameters. We show applications to topic modeling with mixtures of von Mises-Fisher distributions using Bregman Clustering.
- Handbook of mathematical functions with formulas, graphs, and mathematical tables, volume 55. US Government printing office, 1968.
- Trace of the variance-covariance matrix in natural exponential families. Communications in Statistics-Theory and Methods, 44(6):1241–1254, 2015.
- Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6(9), 2005a.
- Clustering with Bregman divergences. Journal of machine learning research, 6(10), 2005b.
- The diagonal multivariate natural exponential families and their classification. Journal of Theoretical Probability, 7:883–929, 1994.
- Legendre functions and the method of random Bregman projections. Journal of convex analysis, 4(1):27–67, 1997.
- Chi-sim: A new similarity measure for the co-clustering task. In 2008 Seventh International Conference on Machine Learning and Applications, pp. 211–217. IEEE, 2008.
- Christie, D. Efficient von Mises-Fisher concentration parameter estimation using Taylor series. Journal of Statistical Computation and Simulation, 85(16):3259–3265, 2015.
- Hyperspherical variational auto-encoders. In 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pp. 856–865. Association For Uncertainty in Artificial Intelligence (AUAI), 2018.
- Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977.
- Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 89–98, 2003.
- On the computation of modified bessel function ratios. Mathematics of Computation, 32(143):865–875, 1978.
- Characteristic property of a class of multivariate variance functions. Lithuanian Mathematical Journal, 55:506–517, 2015.
- Model-based hierarchical clustering with Bregman divergences and Fishers mixture model: application to depth image analysis. Statistics and Computing, 26:861–880, 2016.
- On maximum likelihood estimation of the concentration parameter of von Mises-Fisher distributions. Computational statistics, 29:945–957, 2014.
- mpmath: a python library for arbitrary-precision floating-point arithmetic. Zenodo, 2013.
- Kim, M. On pytorch implementation of density estimators for von Mises-Fisher and its mixture. arXiv preprint arXiv:2102.05340, 2021.
- Lang, K. Newsweeder: Learning to filter netnews. In Machine learning proceedings 1995, pp. 331–339. Elsevier, 1995.
- Natural real exponential families with cubic variance functions. The Annals of Statistics, 18(1):1–37, 1990.
- Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research, 5(Apr):361–397, 2004.
- Mardia, K. V. Statistics of directional data. Journal of the Royal Statistical Society Series B: Statistical Methodology, 37(3):349–371, 1975.
- Mitchell, T. M. Machine learning. McGraw-Hill, 1997.
- Morris, C. N. Natural exponential families with quadratic variance functions: statistical theory. The Annals of Statistics, pp. 515–529, 1983.
- Rao, C. R. Linear statistical inference and its applications, volume 2. Wiley New York, 1973.
- Input convex gradient networks. arXiv preprint arXiv:2111.12187, 2021.
- Robertson, S. Understanding inverse document frequency: on theoretical arguments for idf. Journal of documentation, 60(5):503–520, 2004.
- Rockafellar, R. T. Convex analysis, volume 11. Princeton University Press, 1997.
- Salton, G. The SMART retrieval system—experiments in automatic document processing. Prentice-Hall, Inc., 1971.
- High-order parameter approximation for von Mises-Fisher distributions. Applied Mathematics and Computation, 218(24):11880–11890, 2012.
- Sra, S. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of is(x)subscript𝑖𝑠𝑥i_{s}(x)italic_i start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x ). Computational Statistics, 27:177–190, 2012.
- Parameter estimation for von Mises-Fisher distributions. Computational Statistics, 22:145–157, 2007.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
- Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234, 2016.
- Watson, G. N. A treatise on the theory of Bessel functions. Cambridge University Press, 2nd Edition, 1995.
- Spherical latent spaces for stable variational autoencoders. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4503–4513, 2018.
- A comparative study of generative models for document clustering. In Proceedings of the workshop on clustering high dimensional data and its applications in SIAM data mining conference, 2003.