Papers
Topics
Authors
Recent
Search
2000 character limit reached

Representation Learning via Variational Bayesian Networks

Published 28 Jun 2023 in cs.LG, cs.AI, and cs.CL | (2306.16326v1)

Abstract: We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for modeling entities in the ``long-tail'', where the data is scarce. VBN provides better modeling for long-tail entities via two complementary mechanisms: First, VBN employs informative hierarchical priors that enable information propagation between entities sharing common ancestors. Additionally, VBN models explicit relations between entities that enforce complementary structure and consistency, guiding the learned representations towards a more meaningful arrangement in space. Second, VBN represents entities by densities (rather than vectors), hence modeling uncertainty that plays a complementary role in coping with data scarcity. Finally, we propose a scalable Variational Bayes optimization algorithm that enables fast approximate Bayesian inference. We evaluate the effectiveness of VBN on linguistic, recommendations, and medical inference tasks. Our findings show that VBN outperforms other existing methods across multiple datasets, and especially in the long-tail.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Learning lexical subspaces in a distributional vector space. Transactions of the Association for Computational Linguistics (TACL), pages 311–329, 2020.
  2. Oren Barkan. Bayesian neural word embedding. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  3. Attentive item2vec: Neural attentive user representations. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
  4. Cold item recommendations via hierarchical item2vec. In 2020 IEEE International Conference on Data Mining (ICDM), pages 912–917. IEEE Computer Society, 2020.
  5. Explainable recommendations via attentive multi-persona collaborative filtering. In ACM Conference on Recommender Systems (RecSys), 2020.
  6. Anchor-based collaborative filtering. In Proceedings of the ACM International Conference on Information & Knowledge Management (CIKM), 2021.
  7. Neural attentive multiview machines. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
  8. Item2vec: Neural item embedding for collaborative filtering. In 2016 IEEE Machine Learning for Signal Processing (MLSP), 2016.
  9. Cb2cf: a neural multiview content-to-collaborative filtering model for completely cold item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 228–236, 2019.
  10. Scalable attentive sentence pair modeling via distilled sentence embedding. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2020.
  11. Bayesian hierarchical words representation learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
  12. Andrew C Berry. The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society, 49(1):122–136, 1941.
  13. Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
  14. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017.
  15. Joint word representation learning using a corpus and a semantic lexicon. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
  16. Embedding words as distributions with a bayesian skip-gram model. arXiv preprint arXiv:1711.11027, 2017.
  17. Multimodal distributional semantics. Journal of Artificial Intelligence Research, 49:1–47, 2014.
  18. Cross-document language modeling. arXiv preprint arXiv:2101.00406, 2021.
  19. Denoising word embeddings by averaging in a shared space. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, 2021.
  20. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (ACL), 2019.
  21. Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1606–1615, 2015.
  22. Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406–414, 2001.
  23. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):1–19, 2015.
  24. Sherlock: Sparse hierarchical embeddings for visually-aware one-class collaborative filtering. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2016.
  25. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41(4):665–695, 2015.
  26. Few-shot representation learning for out-of-vocabulary words. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (ACL), 2019.
  27. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 873–882. Association for Computational Linguistics, 2012.
  28. GlossBERT: BERT for word sense disambiguation with gloss knowledge. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3507–3512, 2019.
  29. A variational approach to bayesian logistic regression models and their extensions. In Sixth International Workshop on Artificial Intelligence and Statistics, volume 82, 1997.
  30. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016.
  31. Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy. In Proceedings of the fifth ACM conference on Recommender systems, pages 165–172, 2011.
  32. Xbox movies recommendations: Variational bayes matrix factorization with embedded feature selection. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13, page 129–136. Association for Computing Machinery, 2013.
  33. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
  34. SenseBERT: Driving some sense into BERT. arXiv preprint arXiv:1908.05646, 2019.
  35. Collaborative variational autoencoder for recommender systems. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2017.
  36. Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2678–2688, 2016.
  37. Variational autoencoders for collaborative filtering. In Proceedings of the World Wide Web Conference (WWW), 2018.
  38. Projection word embedding model with hybrid sampling training for classifying icd-10-cm codes: longitudinal observational study. JMIR medical informatics, 7(3):e14499, 2019.
  39. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  40. Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 104–113, 2013.
  41. David JC MacKay. The evidence framework applied to classification networks. Neural computation, 4(5):720–736, 1992.
  42. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
  43. George A Miller. WordNet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
  44. Introduction to WordNet: An on-line lexical database. International journal of lexicography, 3(4):235–244, 1990.
  45. A semantic concordance. In Proceedings of the workshop on Human Language Technology, pages 303–308. Association for Computational Linguistics, 1993.
  46. Generalizing point embeddings using the wasserstein space of elliptical distributions. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS), 2018.
  47. One-class collaborative filtering with random graphs. In Proceedings of the 22nd International Conference on World Wide Web, page 999–1008, 2013.
  48. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  49. Knowledge enhanced contextual word representations. In EMNLP, 2019.
  50. Mimicking word embeddings using subword RNNs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
  51. Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proceedings of the 25th international conference on Machine learning, pages 880–887. ACM, 2008.
  52. Attentive mimicking: Better word embeddings by attending to informative contexts. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2019.
  53. Rare words: A major problem for contextualized representation and how to fix it by attentive mimicking. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
  54. Poincare glove: Hyperbolic word embeddings. In International Conference on Learning Representations, 2019.
  55. Word representations via gaussian embedding. International Conference on Learning Representations, 2015.
  56. Neural graph collaborative filtering. In Proceedings of the international ACM conference on Research and development in Information Retrieval (SIGIR), 2019.
  57. Collaborative denoising auto-encoders for top-n recommender systems. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2016.
  58. Mo Yu and Mark Dredze. Improving lexical embeddings with semantic knowledge. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 2014.
  59. Word semantic representations using Bayesian probabilistic tensor factorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1522–1531, 2014.
  60. Content-collaborative disentanglement representation learning for enhanced recommendation. In ACM Conference on Recommender Systems (RecSys), 2020.
Citations (14)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.