Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation (2405.18220v3)

Published 28 May 2024 in stat.ML and cs.LG

Abstract: Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using $\alpha$-divergence face analytical challenges due to the $\alpha$-power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called E$2$M algorithm. It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We demonstrate the effectiveness of our approach in classification and density estimation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. (1987). Congressional Voting Records. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5C01P.
  2. (1989). Solar Flare. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5530G.
  3. Amari, S.-i. (2016). Information geometry and its applications, volume 194. Springer.
  4. Chess (King-Rook vs. King). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C57W2S.
  5. Tree tensor networks for generative modeling. Phys. Rev. B, 99:155131.
  6. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299.
  7. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1–22.
  8. Fast Tucker rank reduction for non-negative tensors using mean-field approximation. In Advances in Neural Information Processing Systems, volume 34, pages 443–454, Virtual Event.
  9. Many-body approximation for non-negative tensors. In Advances in Neural Information Processing Systems, volume 36, pages 257–292, New Orleans, US.
  10. Expressive power of tensor-network factorizations for probabilistic modeling. Advances in neural information processing systems, 32.
  11. Hayes-Roth. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5501T.
  12. Kullback-Leibler principal component for tensors is not NP-hard. In 2017 51st Asilomar Conference on Signals, Systems, and Computers, pages 693–697. IEEE.
  13. Matrix product states algorithms and continuous systems. Physical Review B, 75(10):104305.
  14. Jensen, J. L. W. V. (1906). Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30(1):175–193.
  15. Nonnegative Tucker decomposition with alpha-divergence. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1829–1832.
  16. Knowledge discovery approach to automated cardiac spect diagnosis. Artificial intelligence in medicine, 23(2):149–169.
  17. A multiresolution non-negative tensor factorization approach for single channel sound source separation. Signal Processing, 105:56–69.
  18. Image completion using low tensor tree rank and total variation minimization. IEEE Transactions on Multimedia, 21(2):338–350.
  19. Tensor-train density estimation. In Uncertainty in artificial intelligence, pages 1321–1331. PMLR.
  20. Orús, R. (2014). A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of Physics, 349:117–158.
  21. Fast and efficient algorithms for nonnegative Tucker decomposition. In Advances in Neural Networks-ISNN 2008: 5th International Symposium on Neural Networks, ISNN 2008, Beijing, China, September 24-28, 2008, Proceedings, Part II 5, pages 772–782. Springer.
  22. Silverman, B. W. (2018). Density estimation for statistics and data analysis. Routledge.
  23. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec):583–617.
  24. Non-negative multiple tensor factorization. In 2013 IEEE 13th International Conference on Data Mining, pages 1199–1204. IEEE.
  25. Thrun, S. (1991). The monk’s problems: A performance comparison of different learning algorithems. Technical Report of Carnegie Mellon University.
  26. Tomczak, J. M. (2021). Deep Generative Modeling. Cham: Springer International Publishing.
  27. Wu, C. J. (1983). On the convergence properties of the em algorithm. The Annals of statistics, pages 95–103.
  28. Kullback-leibler divergence for nonnegative matrix factorization. In International Conference on Artificial Neural Networks, pages 250–257. Springer.
  29. Lymphography. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C54598.
  30. Primary Tumor. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5WK5Q.
Citations (1)

Summary

We haven't generated a summary for this paper yet.