Estimation of embedding vectors in high dimensions (2312.07802v2)
Abstract: Embeddings are a basic initial feature extraction step in many machine learning models, particularly in natural language processing. An embedding attempts to map data tokens to a low-dimensional space where similar tokens are mapped to vectors that are close to one another by some metric in the embedding space. A basic question is how well can such embedding be learned? To study this problem, we consider a simple probability model for discrete data where there is some "true" but unknown embedding where the correlation of random variables is related to the similarity of the embeddings. Under this model, it is shown that the embeddings can be learned by a variant of low-rank approximate message passing (AMP) method. The AMP approach enables precise predictions of the accuracy of the estimation in certain high-dimensional limits. In particular, the methodology provides insight on the relations of key parameters such as the number of samples per value, the frequency of the terms, and the strength of the embedding correlation on the probability distribution. Our theoretical findings are validated by simulations on both synthetic data and real text data.
- D. S. Asudani, N. K. Nagwani, and P. Singh, “Impact of word embedding models on text analytics in deep learning environment: a review,” Artificial Intelligence Review, vol. 56, no. 9, p. 10345–10425, Feb. 2023.
- J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013.
- R. A. Stein, P. A. Jaques, and J. F. Valiati, “An analysis of hierarchical text classification using word embeddings,” Information Sciences, vol. 471, pp. 216–232, jan 2019.
- A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics, Apr. 2017, pp. 427–431.
- N. K. Kumar and J. Schneider, “Literature survey on low rank approximation of matrices,” Linear and Multilinear Algebra, vol. 65, no. 11, pp. 2212–2244, 2017.
- D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Proceedings of the 13th International Conference on Neural Information Processing Systems, ser. NIPS’00. Cambridge, MA, USA: MIT Press, 2000, p. 535–541.
- R. Matsushita and T. Tanaka, “Low-rank matrix reconstruction and clustering via approximate message passing,” in Advances in Neural Information Processing Systems, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26. Curran Associates, Inc., 2013.
- S. Rangan and A. K. Fletcher, “Iterative estimation of constrained rank-one matrices in noise,” CoRR, vol. abs/1202.2759, 2012.
- T. Lesieur, F. Krzakala, and L. Zdeborová, “Mmse of probabilistic low-rank matrix estimation: Universality with respect to the output channel,” in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2015, pp. 680–687.
- T. Lesieur, F. Krzakala, and L. Zdeborová , “Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2017, no. 7, p. 073403, jul 2017.
- D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18 914–18 919, nov 2009.
- M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Transactions on Information Theory, vol. 57, no. 2, pp. 764–785, 2011.
- O. Y. Feng, R. Venkataramanan, C. Rush, and R. J. Samworth, “A unifying tutorial on approximate message passing,” 2021.
- A. K. Fletcher, S. Rangan, and P. Schniter, “Inference in deep networks in high dimensions,” in Proc. IEEE International Symposium on Information Theory. IEEE, 2018, pp. 1884–1888.
- Y. Deshpande and A. Montanari, “Information-theoretically optimal sparse pca,” 2014.
- Y. Deshpande, E. Abbe, and A. Montanari, “Asymptotic mutual information for the two-groups stochastic block model,” 2015.
- A. Montanari and E. Richard, “Non-negative principal component analysis: Message passing algorithms and sharp asymptotics,” IEEE Transactions on Information Theory, vol. 62, no. 3, pp. 1458–1484, 2016.
- Y. Kabashima, F. Krzakala, M. Mezard, A. Sakata, and L. Zdeborova, “Phase transitions and sample complexity in bayes-optimal matrix factorization,” IEEE Transactions on Information Theory, vol. 62, no. 7, pp. 4228–4265, jul 2016.
- A. Montanari and R. Venkataramanan, “Estimation of low-rank matrices via approximate message passing,” 2019.
- A. K. Fletcher and S. Rangan, “Iterative reconstruction of rank-one matrices in noise,” Information and Inference: A Journal of the IMA, vol. 7, no. 3, pp. 531–562, 2018.
- A. K. Fletcher, M. Sahraee-Ardakan, S. Rangan, and P. Schniter, “Expectation consistent approximate inference: Generalizations and convergence,” 2017.
- A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA: Association for Computational Linguistics, June 2011, pp. 142–150.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.