Estimating Text Similarity based on Semantic Concept Embeddings (2401.04422v1)
Abstract: Due to their ease of use and high accuracy, Word2Vec (W2V) word embeddings enjoy great success in the semantic representation of words, sentences, and whole documents as well as for semantic similarity estimation. However, they have the shortcoming that they are directly extracted from a surface representation, which does not adequately represent human thought processes and also performs poorly for highly ambiguous words. Therefore, we propose Semantic Concept Embeddings (CE) based on the MultiNet Semantic Network (SN) formalism, which addresses both shortcomings. The evaluation on a marketing target group distribution task showed that the accuracy of predicted target groups can be increased by combining traditional word embeddings with semantic CEs.
- S. Hartrumpf, “Hybrid disambiguation in natural language analysis,” Ph.D. dissertation, FernUniversität in Hagen, 2002.
- M. Damonte, S. B. Cohen, and G. Satta, “An incremental parser for abstract meaning representation,” in Proceedings of EACL (2017), Valencia, Spain, 2017, pp. 536–546.
- B. Hamp and H. Feldweg, “GermaNet - a lexical-semantic net for German,” in Proceedings of the ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 1997, pp. 9–15.
- T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bochard, “Complex embeddings for simple link prediction,” in Proceedings of the 33rd International Conference on Machine Learning, 2016, pp. 2071–2080.
- J. Goikoetxea, E. Agirre, and A. Soroa, “Single or multiple? Combining word representations independently learned from text and WordNet,” in Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, 2016, pp. 2608–2614.
- A. Kutuzov, A. Panchenko, S. Kohail, M. Dorgham, O. Oliynyk, and C. Biemann, “Learning graph embeddings from wordnet-based similarity measures,” CoRR, vol. abs/1808.05611, 2018. [Online]. Available: http://arxiv.org/abs/1808.05611
- E. L. Mencía, G. de Melo, and J. Nam, “Medical concept embeddings via labeled background corpora,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož, Slovenia: European Language Resources Association (ELRA), May 2016, pp. 4629–4636. [Online]. Available: https://www.aclweb.org/anthology/L16-1733
- J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Katar, 2014.
- A. L. Beam et al., “Clinical concept embeddings learned from massive sources of multimodal medical data,” in Proceeding of the Pacific Symposium of Biocomputing, 2020, pp. 295–306.
- G.-I. Brokos, Prodromos, and I. Androutsopoulos, “Using centroids of word embeddings and word mover‘s distance for biomedical document retrieval in question answering,” in Proceedings of the 15th Workshop on Biomedical Natural Language Processing, Berlin, Germany, 2016, pp. 114–118.
- T. Mikolov, I. Sutskever, C. Ilya, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, Nevada, USA, 2013, pp. 3111–3119.
- E. Gabrilovic and S. Markovitch, “Wikipedia-based semantic interpretation for natural language processing,” Journal of Artificial Intelligence Research, vol. 34, 2009.
- R. Kiros, Y. Zhu, R. Salakhudinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fiedler, “Skip-thought vectors,” in Proceedings of the Conference on Neural Information Processing Systems (NIPS), Montréal, Canada, 2015, pp. 3294–3302.
- Y. Song and D. Roth, “Unsupervised sparse vector densification for short text similarity,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Denver, Colorado, USA, 2015, pp. 1275–1280.
- V. Mijangos, G. Sierra, and A. Montes, “Sentence level matrix representation for document spectral clustering,” Pattern Recognition Letters, vol. 85, 2017.
- K.-J. Hong, G.-H. Lee, and H.-J. Kom, “Enhanced document clustering using wikipedia-based document representation,” in Proceedings of the 2015 International Conference on Applied System Innovation (ICASI), 2015, pp. 183–186.
- M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. C. K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in Proc. of NAACL, New Orleans, Louisiana, USA, 2018, pp. 2227––2237.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL, 2019, pp. 4176–4186.
- E. Alsentzer et al., “Publicly available clinical bert embeddings,” 2019.
- M. Siegel and F. Bond, “OdeNet: Compiling a germanwordnet from other resources,” in Global WordNet Conference, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232021541
- F. Mahdisoltani, J. Biega, and F. M. Suchanek, “YAGO3: A knowledge base from multilingual wikipedias,” in Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR 2015), Asilomar, California, USA, 2015.
- S. Hartrumpf, H. Helbig, and R. Osswald, “The semantically based computer lexicon HaGenLex – Structure and technological environment,” Traitement automatique des langues, vol. 44, no. 2, pp. 81–105, 2003.
- Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang, “RotatE: Knowledge graph embedding by relational rotatoin in complex space,” in Proceedings of ICLR, 2019.
- A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yahnenko, “Translating embeddings for modeling multi-relational data,” in Advances in neural information processing systems, 2013, pp. 2787–2795.
- T. Ebisu and R. Ichise, “TorusE: Knowledge graph embedding on a lie group,” in Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI Press, 2018, pp. 1819–1826.
- M. Lynn, “Segmenting and targeting your market: Strategies and limitations,” Cornell University, Tech. Rep., 2011, online: http://scholorship.sha.cornell.edu/articles/243.
- T. vor der Brück and M. Pouly, “Text similarity estimation based on word embeddings and matrix norms for targeted marketing,” in Proceedings of NAACL, 2019, pp. 1827–1836.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.