E$^2$M: Double Bounded $α$-Divergence Optimization for Tensor-based Discrete Density Estimation (2405.18220v3)
Abstract: Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using $\alpha$-divergence face analytical challenges due to the $\alpha$-power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called E$2$M algorithm. It circumvents this issue by first relaxing the optimization into minimization of a surrogate objective based on the Kullback-Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We demonstrate the effectiveness of our approach in classification and density estimation tasks.
- (1987). Congressional Voting Records. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5C01P.
- (1989). Solar Flare. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5530G.
- Amari, S.-i. (2016). Information geometry and its applications, volume 194. Springer.
- Chess (King-Rook vs. King). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C57W2S.
- Tree tensor networks for generative modeling. Phys. Rev. B, 99:155131.
- On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299.
- Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1–22.
- Fast Tucker rank reduction for non-negative tensors using mean-field approximation. In Advances in Neural Information Processing Systems, volume 34, pages 443–454, Virtual Event.
- Many-body approximation for non-negative tensors. In Advances in Neural Information Processing Systems, volume 36, pages 257–292, New Orleans, US.
- Expressive power of tensor-network factorizations for probabilistic modeling. Advances in neural information processing systems, 32.
- Hayes-Roth. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5501T.
- Kullback-Leibler principal component for tensors is not NP-hard. In 2017 51st Asilomar Conference on Signals, Systems, and Computers, pages 693–697. IEEE.
- Matrix product states algorithms and continuous systems. Physical Review B, 75(10):104305.
- Jensen, J. L. W. V. (1906). Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30(1):175–193.
- Nonnegative Tucker decomposition with alpha-divergence. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1829–1832.
- Knowledge discovery approach to automated cardiac spect diagnosis. Artificial intelligence in medicine, 23(2):149–169.
- A multiresolution non-negative tensor factorization approach for single channel sound source separation. Signal Processing, 105:56–69.
- Image completion using low tensor tree rank and total variation minimization. IEEE Transactions on Multimedia, 21(2):338–350.
- Tensor-train density estimation. In Uncertainty in artificial intelligence, pages 1321–1331. PMLR.
- Orús, R. (2014). A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of Physics, 349:117–158.
- Fast and efficient algorithms for nonnegative Tucker decomposition. In Advances in Neural Networks-ISNN 2008: 5th International Symposium on Neural Networks, ISNN 2008, Beijing, China, September 24-28, 2008, Proceedings, Part II 5, pages 772–782. Springer.
- Silverman, B. W. (2018). Density estimation for statistics and data analysis. Routledge.
- Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec):583–617.
- Non-negative multiple tensor factorization. In 2013 IEEE 13th International Conference on Data Mining, pages 1199–1204. IEEE.
- Thrun, S. (1991). The monk’s problems: A performance comparison of different learning algorithems. Technical Report of Carnegie Mellon University.
- Tomczak, J. M. (2021). Deep Generative Modeling. Cham: Springer International Publishing.
- Wu, C. J. (1983). On the convergence properties of the em algorithm. The Annals of statistics, pages 95–103.
- Kullback-leibler divergence for nonnegative matrix factorization. In International Conference on Artificial Neural Networks, pages 250–257. Springer.
- Lymphography. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C54598.
- Primary Tumor. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5WK5Q.