A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches (2303.07196v2)
Abstract: Vector-based word representations help countless NLP tasks capture the language's semantic and syntactic regularities. In this paper, we present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks. We categorize the methods into two main groups - Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well. On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations. We report experimental results on multiple classification tasks and highlight the scenarios where one approach performs better than the rest.
- “Stochastic answer networks for machine reading comprehension” In arXiv preprint arXiv:1712.03556, 2017
- Caiming Xiong, Victor Zhong and Richard Socher “Dynamic coattention networks for question answering” In arXiv preprint arXiv:1611.01604, 2016
- “End-to-end learning of semantic role labeling using recurrent neural networks” In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, pp. 1127–1137
- “Deep semantic role labeling: What works and what’s next” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 473–483
- William Foland and James H Martin “Dependency-based semantic role labeling using convolutional neural networks” In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, 2015, pp. 279–288
- “Enhanced lstm for natural language inference” In arXiv preprint arXiv:1609.06038, 2016
- “A context-aware recurrent encoder for neural machine translation” In IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.12 IEEE, 2017, pp. 2424–2432
- “Twitter sentiment analysis with deep convolutional neural networks” In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, pp. 959–962
- Chi Sun, Luyao Huang and Xipeng Qiu “Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence” In arXiv preprint arXiv:1903.09588, 2019
- “Rumor detection on social media: A multi-view model using self-attention mechanism” In International Conference on Computational Science, 2019, pp. 339–352 Springer
- “Fake news detection through multi-perspective speaker profiles” In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2017, pp. 252–256
- Thomas K Landauer, Peter W Foltz and Darrell Laham “An introduction to latent semantic analysis” In Discourse processes 25.2-3 Taylor & Francis, 1998, pp. 259–284
- David M Blei, Andrew Y Ng and Michael I Jordan “Latent dirichlet allocation” In Journal of machine Learning research 3.Jan, 2003, pp. 993–1022
- “A neural probabilistic language model” In Journal of machine learning research 3.Feb, 2003, pp. 1137–1155
- “Learning word embeddings efficiently with noise-contrastive estimation” In Advances in neural information processing systems, 2013, pp. 2265–2273
- “Deep contextualized word representations” In arXiv preprint arXiv:1802.05365, 2018
- “Bert: Pre-training of deep bidirectional transformers for language understanding” In arXiv preprint arXiv:1810.04805, 2018
- Amir Bakarov “A survey of word embeddings evaluation methods” In arXiv preprint arXiv:1801.09536, 2018
- Marco Baroni, Georgiana Dinu and Germán Kruszewski “Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors” In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 238–247
- “Evaluating word embedding models: methods and experimental results” In APSIPA Transactions on Signal and Information Processing 8 Cambridge University Press, 2019
- “Evaluation methods for unsupervised word embeddings” In Proceedings of the 2015 conference on empirical methods in natural language processing, 2015, pp. 298–307
- Xiao Yang, Craig Macdonald and Iadh Ounis “Using word embeddings in twitter election classification” In Information Retrieval Journal 21.2-3 Springer, 2018, pp. 183–207
- Billy Chiu, Anna Korhonen and Sampo Pyysalo “Intrinsic evaluation of word vectors fails to predict extrinsic performance” In Proceedings of the 1st workshop on evaluating vector-space representations for NLP, 2016, pp. 1–6
- Marwa Naili, Anja Habacha Chaibi and Henda Hajjami Ben Ghezala “Comparative study of word embedding methods in topic segmentation” In Procedia computer science 112 Elsevier, 2017, pp. 340–349
- “A comparative study of word embeddings for reading comprehension” In arXiv preprint arXiv:1703.00993, 2017
- “Comparative study of word embeddings models and their usage in Arabic language applications” In 2018 International Arab Conference on Information Technology (ACIT), 2018, pp. 1–7 IEEE
- “Producing high-dimensional semantic spaces from lexical co-occurrence” In Behavior Research Methods, Instruments, & Computers 28.2, 1996, pp. 203–208 DOI: 10.3758/BF03204766
- Douglas LT Rohde, Laura M Gonnerman and David C Plaut “An improved model of semantic similarity based on lexical co-occurrence” In Communications of the ACM 8.627-633 Citeseer, 2006, pp. 116
- “Word Embeddings through Hellinger PCA” In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics Gothenburg, Sweden: Association for Computational Linguistics, 2014, pp. 482–490 DOI: 10.3115/v1/E14-1051
- “Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database” In arXiv preprint arXiv:1610.01520, 2016
- “Recurrent neural network based language model” In Eleventh annual conference of the international speech communication association, 2010
- “Extensions of recurrent neural network language model” In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2011, pp. 5528–5531 IEEE
- “A unified architecture for natural language processing: Deep neural networks with multitask learning” In Proceedings of the 25th international conference on Machine learning, 2008, pp. 160–167
- “Efficient representation of word representations in vector space” In Proceedings of the international workshop on learning representations (ICLR), 2013
- Joseph Turian, Lev Ratinov and Yoshua Bengio “Word representations: a simple and general method for semi-supervised learning” In Proceedings of the 48th annual meeting of the association for computational linguistics, 2010, pp. 384–394 Association for Computational Linguistics
- “Distributed representations of words and phrases and their compositionality” In Advances in neural information processing systems, 2013, pp. 3111–3119
- Jeffrey Pennington, Richard Socher and Christopher D Manning “Glove: Global vectors for word representation” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543
- “Enriching word vectors with subword information” In Transactions of the Association for Computational Linguistics 5 MIT Press, 2017, pp. 135–146
- Radu Soricut and Franz Josef Och “Unsupervised morphology induction using word embeddings” In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 1627–1637
- “Misspelling Oblivious Word Embeddings” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 3226–3234
- “Efficient non-parametric estimation of multiple embeddings per word in vector space” In arXiv preprint arXiv:1504.06654, 2015
- “Improving word representations via global context and multiple word prototypes” In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, 2012, pp. 873–882 Association for Computational Linguistics
- “Exploring the limits of language modeling” In arXiv preprint arXiv:1602.02410, 2016
- Gábor Melis, Chris Dyer and Phil Blunsom “On the state of the art of evaluation in neural language models” In arXiv preprint arXiv:1707.05589, 2017
- “Semi-supervised sequence tagging with bidirectional language models” In arXiv preprint arXiv:1705.00108, 2017
- “Improving language understanding by generative pre-training” In URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf, 2018
- “Language models are unsupervised multitask learners” In OpenAI Blog 1.8, 2019, pp. 9
- “Attention is all you need” In Advances in neural information processing systems, 2017, pp. 5998–6008
- “Google’s neural machine translation system: Bridging the gap between human and machine translation” In arXiv preprint arXiv:1609.08144, 2016
- “Tensorflow: a system for large-scale machine learning.” In Osdi 16.2016, 2016, pp. 265–283 Savannah, GA, USA
- “Software Framework for Topic Modelling with Large Corpora” http://is.muni.cz/publication/884893/en In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks Valletta, Malta: ELRA, 2010, pp. 45–50