Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Social World Knowledge: Modeling and Applications (2306.16299v1)

Published 28 Jun 2023 in cs.AI and cs.SI

Abstract: Social world knowledge is a key ingredient in effective communication and information processing by humans and machines alike. As of today, there exist many knowledge bases that represent factual world knowledge. Yet, there is no resource that is designed to capture social aspects of world knowledge. We believe that this work makes an important step towards the formulation and construction of such a resource. We introduce SocialVec, a general framework for eliciting low-dimensional entity embeddings from the social contexts in which they occur in social networks. In this framework, entities correspond to highly popular accounts which invoke general interest. We assume that entities that individual users tend to co-follow are socially related, and use this definition of social context to learn the entity embeddings. Similar to word embeddings which facilitate tasks that involve text semantics, we expect the learned social entity embeddings to benefit multiple tasks of social flavor. In this work, we elicited the social embeddings of roughly 200K entities from a sample of 1.3M Twitter users and the accounts that they follow. We employ and gauge the resulting embeddings on two tasks of social importance. First, we assess the political bias of news sources in terms of entity similarity in the social embedding space. Second, we predict the personal traits of individual Twitter users based on the social embeddings of entities that they follow. In both cases, we show advantageous or competitive performance using our approach compared with task-specific baselines. We further show that existing entity embedding schemes, which are fact-based, fail to capture social aspects of knowledge. We make the learned social entity embeddings available to the research community to support further exploration of social world knowledge and its applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web. 2015;6(2):167–195.
  2. From freebase to wikidata: The great migration. In: Proceedings of the 25th international conference on world wide web; 2016. p. 1419–1428.
  3. Never-ending learning. Communications of the ACM. 2018;61(5):103–115.
  4. Learning relational features with backward random walks. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing; 2015.
  5. Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics; 2016. p. 250–259.
  6. ERNIE: Enhanced Language Representation with Informative Entities. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL. Association for Computational Linguistics; 2019.
  7. E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT. In: Findings of the Association for Computational Linguistics: EMNLP 2020; 2020.
  8. Visualizing media bias through Twitter. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 6; 2012.
  9. Would you Like to Talk about Sports Now?: Towards Contextual Topic Suggestion for Open-Domain Conversational Agents. In: CHIIR ’20: Conference on Human Information Interaction and Retrieval; 2020.
  10. Hovy D, Yang D. The importance of modeling social factors of language: Theory and practice. In: The Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2021.
  11. On learning and representing social meaning in NLP: a sociolinguistic perspective. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2021.
  12. Identification of topical subpopulations on social media. Information Sciences. 2020;528:92–112.
  13. Demographic representation and collective storytelling in the me too twitter hashtag activism movement. Proceedings of the ACM on Human-Computer Interaction. 2021;5.
  14. Predictive analysis on Twitter: Techniques and applications. In: Emerging research challenges and opportunities in computational social network analysis and mining. Springer; 2019. p. 67–104.
  15. Marwick A, Boyd D. To see and be seen: Celebrity practice on Twitter. Convergence. 2011;17(2):139–158.
  16. Efficient Estimation of Word Representations in Vector Space. In: 1st International Conference on Learning Representations, ICLR; 2013.
  17. How many people live in political bubbles on social media? Evidence from linked survey and Twitter data. Sage Open. 2019;9(1).
  18. Center PR. Political polarization in the american public. Annual Review of Political Science. 2014;.
  19. US media polarization and the 2020 election: A nation divided. Pew Research Center. 2020;24.
  20. Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. Advances in neural information processing systems. 2014;27:2177–2185.
  21. Deepwalk: Online learning of social representations. In: Proceedings of the ACM SIGKDD international conference; 2014.
  22. Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge Discovery and Data Mining; 2016. p. 855–864.
  23. Information network or social network? The structure of the Twitter follow graph. In: Proceedings of the 23rd International Conference on World Wide Web; 2014. p. 493–498.
  24. Barkan O, Koenigstein N. ITEM2VEC: Neural item embedding for collaborative filtering. In: 26th IEEE International Workshop on Machine Learning for Signal Processing, MLSP; 2016.
  25. Corpus-Level Fine-Grained Entity Typing. Journal of Artificial Intelligence Research (JAIR). 2018;61:835–862.
  26. Learning Entity Representations for Few-Shot Reconstruction of Wikipedia Categories; 2019.
  27. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; 2020. p. 23–30.
  28. Witten IH, Milne D. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy; 2008.
  29. Contextualized End-to-End Neural Entity Linking. In: Wong K, Knight K, Wu H, editors. Proceedings of the Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP; 2020.
  30. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA. CoRR. 2019;abs/1911.03681.
  31. An Open-World Extension to Knowledge Graph Completion Models. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI; 2019.
  32. PyTorch-BigGraph: A Large-scale Graph Embedding System. In: Proceedings of the 2nd Conference on Machine Learning and Systems (SysML). Palo Alto, CA, USA; 2019.
  33. Translating embeddings for modeling multi-relational data. In: Neural Information Processing Systems (NIPS); 2013. p. 1–9.
  34. KORE: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the ACM international conference on Information and knowledge management (CIKM); 2012. p. 545–554.
  35. Mitchell A. Key findings on the traits and habits of the modern news consumer. Pew Research Center. 2016;.
  36. Allcott H, Gentzkow M. Social media and fake news in the 2016 election. Journal of economic perspectives. 2017;31(2):211–36.
  37. Identifying Framing Bias in Online News. ACM Transactions on Social Computing. 2018;1(2).
  38. What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics; 2020.
  39. Predicting the Topical Stance and Political Leaning of Media using Tweets. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL); 2020.
  40. Media bias monitor: Quantifying biases of social media news outlets at large-scale. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 12; 2018.
  41. Hinds J, Joinson AN. What demographic attributes do our digital footprints reveal? A systematic review. PloS one. 2018;13(11).
  42. Assessing the contribution of twitter’s textual information to graph-based recommendation. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces; 2017. p. 511–516.
  43. Hu L, Kearney MW. Gendered tweets: Computational text analysis of gender differences in political discussion on Twitter. Journal of Language and Social Psychology. 2021;40(4):482–503.
  44. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the national academy of sciences. 2013;110(15):5802–5805.
  45. Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one. 2013;8(9).
  46. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences. 2015;112(4):1036–1040.
  47. Learning Multiview Embeddings of Twitter Users. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2016.
  48. Inferring latent user properties from texts published in social media. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2015.
  49. Volkova S, Bachrach Y. Inferring perceived demographics from user emotional tone and user-environment emotional contrast. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL); 2016.
  50. Predicting the Demographics of Twitter Users from Website Traffic Data. In: AAAI; 2015. p. 72–78.
  51. Writer profiling without the writer’s text. In: International Conference on Social Informatics. Springer; 2017. p. 537–558.
  52. Aletras N, Chamberlain BP. Predicting Twitter User Socioeconomic Attributes with Network and Language Information. In: Proceedings of the 29th on Hypertext and Social Media, HT 2018, Baltimore, MD, USA, July 09-12, 2018. ACM; 2018.
  53. Twitter Homophily: Network Based Prediction of User’s Occupation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2019.
  54. Harris ZS. Distributional structure. Word. 1954;10(2-3):146–162.
  55. Distributed Representations of Words and Phrases and their Compositionality. In: Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, editors. Conference on Neural Information Processing Systems NIPS; 2013.
  56. Model-based Word Embeddings from Decompositions of Count Matrices. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing; 2015.
  57. github. Tweepy - Twitter API for python;.
  58. Řehůřek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA; 2010. p. 45–50.
  59. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics. 2015;41(4):665–695.
  60. PyTorch-BigGraph: A Large-scale Graph Embedding System. CoRR. 2019;abs/1903.12287.
  61. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation. Computational Linguistics. 2015;41(4).
  62. Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics; 2018.
  63. Bag of Tricks for Efficient Text Classification. In: Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics; 2017.
  64. Sosea T, Caragea C. CANCEREMO: A Dataset for Fine-Grained Emotion Detection. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP); 2020.
  65. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks. In: International Conference on Learning Representations, ICLR; 2017.
  66. Social Bias in Elicited Natural Language Inferences. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Association for Computational Linguistics; 2017.
  67. Exploring Stylistic Variation with Age and Income on Twitter. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics; 2016.
  68. Predicting twitter user demographics from names alone. In: Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media; 2018.
  69. Flek L. Returning the N to NLP: Towards contextually personalized classification models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics; 2020.
  70. AlDayel A, Magdy W. Stance detection on social media: State of the art and trends. Information Processing & Management. 2021;58(4):102597.
  71. Towards Hate Speech Detection at Large via Deep Generative Modeling. IEEE Internet Computing. 2021;25(2):48–57.
  72. Evaluating vector-space models of analogy. arXiv. 2017; p. 0–5.
  73. Automatic gloss finding for a knowledge base using ontological constraints. In: Proceedings of the ACM international conference on Web Search and Data Mining; 2015.
Citations (3)

Summary

We haven't generated a summary for this paper yet.