Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Semi-Nonparametric Estimation of Distribution Divergence in Non-Euclidean Spaces (2204.02031v2)

Published 5 Apr 2022 in cs.IT and math.IT

Abstract: This paper explores methods for estimating or approximating the total variation distance and the chi-squared divergence of probability measures within topological sample spaces, using independent and identically distributed samples. Our focus is on the practical scenario where the sample space is homeomorphic to subsets of Euclidean space, with the specific homeomorphism remaining unknown. Our proposed methods rely on the integral probability metric with witness functions in universal reproducing kernel Hilbert spaces (RKHSs). The estimators we develop consist of learnable parametric functions mapping the sample space to Euclidean space, paired with universal kernels defined in Euclidean space. This approach effectively overcomes the challenge of constructing universal kernels directly on non-Euclidean spaces. Furthermore, the estimators we devise demonstrate asymptotic consistency, and we provide a detailed statistical analysis, shedding light on their practical implementation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. D. P. Kingma and M. Welling. Auto-Encoding variational bayes. In Proc. Int. Conf. on Learning Representations, Banff, Canada, April 2014.
  2. Deep variational information bottleneck. In Proc. Int. Conf. Learning Representations, Toulon, France, April 2017.
  3. Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Trans. Inf. Theory, 55(5):2392–2405, May 2009.
  4. S. Singh and B. Poczos. Finite-sample analysis of fixed-k nearest neighbor density functional estimators. In Proc. Int. Conf. on Neural Inform. Processing Systems, Barcelona, Spain, December 2016.
  5. Direct estimation of information divergence using nearest neighbor ratios. In Proc. IEEE Int. Symp. on Inform. Theory, Aachen, Germany, June 2017.
  6. Nonparametric von Mises estimators for entropies, divergences and mutual informations. In Proc. Int. Conf. on Neural Inform. Processing Systems, Montreal, Quebec, Canada, December 2015.
  7. Variational inference: A review for statisticians. Journal of the American Statistical Association, pages 859–877, July 2017.
  8. Mutual information neural estimation. In Proc. Int. Conf. Machine Learning, Stockholm, Sweden, July 2018.
  9. Reliable estimation of KL divergence using a discriminator in reproducing kernel Hilbert space. In Advances in Neural Information Processing Systems, Virtual, 2021.
  10. A. Müller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2):429–443, June 1997.
  11. Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 12(70):2389–2410, July 2011.
  12. C. J. Simon-Gabriel and B. Schölkopf. Kernel distribution embedding: universal kernels, characteristic kernels and kernel metrics on distributions. Journal of Machine Learning Research, 19(44):1–29, September 2018.
  13. Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1–2):1–141, June 2017.
  14. I. Steinwart. On the influence of the kernel on the consistency of support vector machines. The Journal of Machine Learning Research, 7:2651–2667, December 2002.
  15. J. M. Lee. Introduction to Smooth Manifolds. Springer, second edition, 2012.
  16. A Hilbert space embedding for distributions. In Proc. Conf. Algorithmic Learning Theory, Sendai, Japan, October 2007.
  17. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517–1561, August 2010.
  18. A kernel method for the two-sample-problem. Advances in neural information processing systems, pages 513–520, 2006.
  19. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, July 2002.
  20. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8(14):361–383, May 2007a.
  21. Measuring statistical dependence with hilbert-schmidt norms. In Proc. Int. Conf. Algorithmic Learning Theory, Singapore, October 2005a.
  22. A kernel statistical test of independence. In Proc. Int. Conf. Neural Information Processing Systems, Vancouver, Canada, December 2007.
  23. Kernel measures of conditional dependence. In Proc. Int. Conf. Neural Information Processing Systems, Vancouver, Canada, December 2007b.
  24. Kernel methods for measuring independence. The Journal of Machine Learning Research, 6:2075–2129, December 2005b.
  25. Kernel choice and classifiability for RKHS embeddings of probability distributions. In Proc. Int. Conf. Neural Information Processing Systems, Vancouver, B.C., Canada, December 2009.
  26. A primer on kernel methods. Kernel Methods in Computational Biology, 47:35–70, 2004.
  27. Kernel two-sample tests for manifold data. arxiv preprint arXiv:2105.03425, 2021.
  28. A. Ozakin and A. Gray. Submanifold density estimation. In Proc. Int. Conf. on Neural Inform. Processing Systems, Vancouver, Canada, December 2009.
  29. Kernel methods on Riemannian manifolds with Gaussian RBF kernels. IEEE Trans. Pattern Anal. Mach. Intell., 37(12):2464–2477, December 2015.
  30. A. Feragen and S. Hauberg. Open problem: Kernel methods on manifolds and metric spaces. In Conference on Learning Theory, Columbia University, New York, June 2016.
  31. A primer on reproducing kernel hilbert spaces. Foundations and Trends® in Signal Processing, 8(1-2):1–126, December 2015.
  32. Universal kernels. The Journal of Machine Learning Research, 2:67–93, March 2006.
  33. Introduction to Hilbert spaces with applications. Academic Press, San Diego, CA, second edition, 1999.
  34. N. Aronszajn. Theory of reproducing kernels. Trans. American Math. Society, 68(3):337–404, May 1950.
  35. R.M. Dudley. Real Analysis and Probability. Cambridge University Press, Cambridge, UK, 2002.
  36. Elements of Information Theory. John Wiley & Sons, Inc., New York, NY, second edition, 2005.
  37. Theoretical Foundations of Functional Data Analysis, With an Introduction to Linear Operators. John Wiley & Sons, 2015.
  38. John B. Conway. A Course in Functional Analysis. Springer-Verlag, New York, NY, second edition, 1990.
  39. Age progression/regression by conditional adversarial autoencoder. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hawaii, United States, July 2017.
  40. Deep residual learning for image recognition. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 2016.
  41. Wasserstein generative adversarial networks. In Proc. Int. Conf. Machine Learning, Sydney, Australia, August 2017.
  42. J. Bretagnolle and C. Huber. Estimation des densités : risque minimax. Séminaire de probabilités de Strasbourg, 12:342–363, 1978.
  43. S. Verdú. Total variation distance and the distribution of relative information. In Inform. Theory and Applications Workshop, San Diego, CA, USA, February 2014.
  44. Eigenvalues of integral operators defined by smooth positive definite kernels. Integral Equations and Operator Theory, 64:61–81, April 2009.
  45. C. MicDiarmid. On the method of bounded differences. Surveys in Combinatorics, 141:195–248, 1989.
  46. W. Hoeffding. A class of statistics with asymptotically normal distributions. Annals of Statistics, 19:293–325, 1948.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.