Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Variational Approach for Efficient KL Divergence Estimation in Dirichlet Mixture Models (2403.12158v1)

Published 18 Mar 2024 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: This study tackles the efficient estimation of Kullback-Leibler (KL) Divergence in Dirichlet Mixture Models (DMM), crucial for clustering compositional data. Despite the significance of DMMs, obtaining an analytically tractable solution for KL Divergence has proven elusive. Past approaches relied on computationally demanding Monte Carlo methods, motivating our introduction of a novel variational approach. Our method offers a closed-form solution, significantly enhancing computational efficiency for swift model comparisons and robust estimation evaluations. Validation using real and simulated data showcases its superior efficiency and accuracy over traditional Monte Carlo-based methods, opening new avenues for rapid exploration of diverse DMM models and advancing statistical analyses of compositional data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. Imre Csiszár. I-divergence geometry of probability distributions and minimization problems. The annals of probability, pages 146–158, 1975.
  2. Finite Mixture Models. Wiley, September 2000. ISBN 9780471721185. doi: 10.1002/0471721182. URL http://dx.doi.org/10.1002/0471721182.
  3. Human ear recognition based on multi-scale local binary pattern descriptor and kl divergence. In 2016 39th International conference on telecommunications and signal processing (TSP), pages 685–688. IEEE, 2016.
  4. Suli Liu and Wu Yao. Prediction of lung cancer using gene expression and deep learning with kl divergence gene selection. BMC bioinformatics, 23(1):175, 2022.
  5. Clustering compositional data using dirichlet mixture model. Plos one, 17(5):e0268438, 2022.
  6. John Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44(2):139–160, 1982.
  7. Bayesian estimation of dirichlet mixture model with variational inference. Pattern Recognition, 47(9):3143–3157, 2014.
  8. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977.
  9. A classification em algorithm for clustering and two stochastic versions. Computational statistics & Data analysis, 14(3):315–332, 1992.
  10. Approximating the kullback leibler divergence between gaussian mixture models. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, volume 4, pages IV–317. IEEE, 2007.
  11. The developmental transcriptome of drosophila melanogaster. Nature, 471(7339):473–479, 2011.
  12. Transformation and model choice for rna-seq co-expression analysis. Briefings in bioinformatics, 19(3):425–436, 2018.
  13. Recount: a multi-experiment resource of analysis-ready rna-seq gene count datasets. BMC bioinformatics, 12:1–5, 2011.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com