Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word Embedding with Neural Probabilistic Prior (2309.11824v1)

Published 21 Sep 2023 in cs.CL

Abstract: To improve word representation learning, we propose a probabilistic prior which can be seamlessly integrated with word embedding models. Different from previous methods, word embedding is taken as a probabilistic generative model, and it enables us to impose a prior regularizing word representation learning. The proposed prior not only enhances the representation of embedding vectors but also improves the model's robustness and stability. The structure of the proposed prior is simple and effective, and it can be easily implemented and flexibly plugged in most existing word embedding models. Extensive experiments show the proposed method improves word representation on various tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Abdulrahman Almuhareb. Attributes in lexical acquisition. PhD thesis, University of Essex, Colchester, UK, 2006.
  2. Jointly learning word embeddings using a corpus and a knowledge base. PloS one, 13(3):e0193094, 2018.
  3. ESSLLI 2008 workshop on distributional lexical semantics. Hamburg, Germany: Association for Logic, Language and Information, 2008.
  4. Distributional memory: A general framework for corpus-based semantics. Comput. Linguistics, 36(4):673–721, 2010.
  5. How we BLESSed distributional semantic evaluation. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pages 1–10, Edinburgh, UK, 2011.
  6. Embedding words as distributions with a bayesian skip-gram model. In Proceedings of the 27th International Conference on Computational Linguistics (COLING), pages 1775–1789, Santa Fe, NM, 2018.
  7. Learning user and product distributed representations using a sequence model for sentiment analysis. IEEE Comput. Intell. Mag., 11(3):34–44, 2016.
  8. Simple and effective multi-paragraph reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 845–855, Melbourne, Australia, 2018.
  9. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, Minneapolis, MN, 2019.
  10. Bradley Efron et al. Defining the curvature of a statistical problem (with applications to second order efficiency). The Annals of Statistics, 3(6):1189–1242, 1975.
  11. Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 1606–1615, Denver, CO, 2015.
  12. Placing search in context: the concept revisited. In Proceedings of the Tenth International World Wide Web Conference (WWW), pages 406–414, Hong Kong, China, 2001. ACM.
  13. Better word embeddings by disentangling contextual n-gram information. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 933–939, Minneapolis, MN.
  14. Hidden markov nonlinear ICA: unsupervised learning from nonstationary time series. In Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI), pages 939–948, virtual online, 2020.
  15. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Comput. Linguistics, 41(4):665–695, 2015.
  16. Long short-term memory. Neural Comput., 9(8):1735–1780, 1997.
  17. Nonlinear ICA of temporally dependent stationary sources. In Aarti Singh and Xiaojin (Jerry) Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 460–469, Fort Lauderdale, FL, 2017.
  18. Semeval-2012 task 2: Measuring degrees of relational similarity. In Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval@NAACL-HLT), pages 356–364, Montréal, Canada, 2012.
  19. Variational autoencoders and nonlinear ICA: A unifying framework. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 2207–2217, Online [Palermo, Sicily, Italy], 2020.
  20. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), Banff, Canada, 2014.
  21. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 2017.
  22. Dependency based embeddings for sentence classification tasks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 1490–1500, San Diego CA, 2016.
  23. Higher-order coreference resolution with coarse-to-fine inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 687–692, New Orleans, LA, 2018.
  24. Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 302–308, Baltimore, MD, 2014.
  25. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems (NIPS), pages 2177–2185, Montreal, Canada, 2014.
  26. Explaining word embeddings via disentangled representation. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL/IJCNLP), pages 720–725, Suzhou, China, 2020.
  27. Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL), pages 104–113, Sofia, Bulgaria, 2013.
  28. Encoding sentences with graph convolutional networks for semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1506–1515, Copenhagen, Denmark, 2017.
  29. The penn treebank: Annotating predicate argument structure. In Proceedings of the Workshop on Human Language Technology (HLT), Plainsboro, NJ, 1994.
  30. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS), pages 3111–3119, Lake Tahoe, NV, 2013.
  31. Linguistic regularities in continuous space word representations. In Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics (NAACL), pages 746–751, Atlanta, GA, 2013.
  32. Counter-fitting word vectors to linguistic constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 142–148, San Diego, CA, 2016.
  33. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, 2014.
  34. Deep contextualized word representations. In Proceedings of the NAACL-HLT 2018, New Orleans, LA, 2018.
  35. CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in ontonotes. In Proceedings of the Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes in the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 1–40, Jeju Island, Korea, 2012.
  36. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9, 2019.
  37. SQuAD: 100, 000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2383–2392, Austin, TX, 2016.
  38. Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL), pages 142–147, Edmonton, Canada, 2003.
  39. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (NIPS), pages 3483–3491, Montreal, Canada, 2015.
  40. Disentanglement by nonlinear ica with general incompressible-flow networks (gin). arXiv preprint arXiv:2001.04872, 2020.
  41. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 1555–1565, Baltimore, MD, 2014.
  42. Incorporating syntactic and semantic information in word embeddings using graph convolutional networks. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL), pages 3308–3318, Florence, Italy, 2019.
  43. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS), pages 5998–6008, Long Beach, CA, 2017.
  44. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems (NeurIPS), pages 5754–5764, Vancouver, Canada, 2019.

Summary

We haven't generated a summary for this paper yet.