Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DIVERSE: A Dataset of YouTube Video Comment Stances with a Data Programming Model (2403.03334v3)

Published 5 Mar 2024 in cs.CL and cs.AI

Abstract: Public opinion of military organizations significantly influences their ability to recruit talented individuals. As recruitment efforts increasingly extend into digital spaces like social media, it becomes essential to assess the stance of social media users toward online military content. However, there is a notable lack of data for analyzing opinions on military recruiting efforts online, compounded by challenges in stance labeling, which is crucial for understanding public perceptions. Despite the importance of stance analysis for successful online military recruitment, creating human-annotated, in-domain stance labels is resource-intensive. In this paper, we address both the challenges of stance labeling and the scarcity of data on public opinions of online military recruitment by introducing and releasing the DIVERSE dataset: https://doi.org/10.5281/zenodo.10493803. This dataset comprises all comments from the U.S. Army's official YouTube Channel videos. We employed a state-of-the-art weak supervision approach, leveraging LLMs to label the stance of each comment toward its respective video and the U.S. Army. Our findings indicate that the U.S. Army's videos began attracting a significant number of comments post-2021, with the stance distribution generally balanced among supportive, oppositional, and neutral comments, with a slight skew towards oppositional versus supportive comments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Can we trust the evaluation on ChatGPT? arXiv preprint arXiv:2303.12767.
  2. Zero-shot stance detection: Paradigms and challenges. Frontiers in Artificial Intelligence, 5: 1070429.
  3. Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465.
  4. Asch, B. J. 2019. Navigating current and emerging army recruiting challenges. RAND Corporation, 11.
  5. Usfd at semeval-2016 task 6: Any-target stance detection on twitter with autoencoders. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), 389–393.
  6. Brown, E. 2023. The Ghost Of GWOT Haunting The Military Recruiting Crisis. The Modern War Institute.
  7. What Happened to Military Recruiting and Retention of Enlisted Personnel in 2020 During the COVID-19 Pandemic? Santa Monica, CA: RAND Corporation.
  8. Literature survey of sarcasm detection. In 2017 International conference on wireless communications, signal processing and networking (WiSPNET), 2041–2046. IEEE.
  9. Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1715–1724.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  11. Du Bois, J. W. 2007. The stance triangle. Stancetaking in discourse: Subjectivity, evaluation, interaction, 164(3): 139–182.
  12. Cu-gwu perspective at semeval-2016 task 6: Ideological stance detection in informal text. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), 434–439.
  13. Neural multi-task learning for stance prediction. In Proceedings of the second workshop on fact extraction and verification (FEVER), 13–19.
  14. A Survey on Automatic Detection of Hate Speech in Text. ACM Comput. Surv., 51(4).
  15. Chain-of-Thought Embeddings for Stance Detection on Social Media. arXiv preprint arXiv:2310.19750.
  16. Cross-platform spread: vaccine-related content, sources, and conspiracy theories in YouTube videos shared in early Twitter COVID-19 conversations. Human vaccines & immunotherapeutics, 18(1): 1–13.
  17. COVIDLies: Detecting COVID-19 Misinformation on Social Media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020.
  18. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  19. Analyzing disinformation and crowd manipulation tactics on YouTube. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 1092–1095. IEEE.
  20. Sentiment Analysis for the Natural Environment: A Systematic Review. ACM Comput. Surv., 56(4).
  21. It Takes Two to Negotiate: Modeling Social Exchange in Online Multiplayer Games. arXiv preprint arXiv:2311.08666.
  22. Mistral 7B. arXiv preprint arXiv:2310.06825.
  23. All-in-one: Multi-task Learning for Rumour Verification. In Proceedings of the 27th International Conference on Computational Linguistics, 3402–3413.
  24. Stance detection: A survey. ACM Computing Surveys (CSUR), 53(1): 1–37.
  25. YouNICon: YouTube’s CommuNIty of Conspiracy Videos. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, 1102–1111.
  26. I am PsyAM: Modeling Happiness with Cognitive Appraisal Dimensions. In Findings of the Association for Computational Linguistics: ACL 2023, 1192–1210.
  27. GPT-4 as a Twitter Data Annotator: Unraveling Its Performance on a Stance Classification Task. Authorea Preprints.
  28. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  29. Analyzing cyber influence campaigns on YouTube using YouTubeTracker. Big Data and Social Media Analytics: Trending Applications, 101–111.
  30. Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media. arXiv preprint arXiv:2305.13047.
  31. Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), 31–41.
  32. Myers, M. 2022. Is the military too ‘woke’ to recruit? How political perceptions may be stifling the willingness to serve. Military Times.
  33. Is my stance the same as your stance? A cross validation study of stance detection datasets. Information Processing & Management, 59(6): 103070.
  34. Recruitment promotion via Twitter: a network-centric approach of analyzing community engagement using social identity. Digital Government: Research and Practice, 4(4): 1–17.
  35. Identifying toxicity within youtube video comment. In Social, Cultural, and Behavioral Modeling: 12th International Conference, SBP-BRiMS 2019, Washington, DC, USA, July 9–12, 2019, Proceedings 12, 214–223. Springer.
  36. Rumor has it: Identifying misinformation in microblogs. In Proceedings of the 2011 conference on empirical methods in natural language processing, 1589–1599.
  37. Training complex models with multi-task weak supervision. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 4763–4771.
  38. Data programming: Creating large training sets, quickly. Advances in neural information processing systems, 29.
  39. A Review of Commercial Capabilities that Focus on Online Influence. Technical Report MTR220415, MITRE, McLean, Vriginia.
  40. Roxana Tiron. 2022. US Military Faces Biggest Recruiting Hurdles in 50 Years.
  41. Tweet stance detection using an attention based neural ensemble model. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers), 1868–1873.
  42. A dataset for multi-target stance detection. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 551–557.
  43. Unifying language learning paradigms. arXiv preprint arXiv:2205.05131.
  44. Stance in replies and quotes (srq): A new dataset for learning stance in twitter conversations. arXiv preprint arXiv:2006.00691.
  45. How would stance detection techniques evolve after the launch of chatgpt? arXiv preprint arXiv:2212.14548.
  46. A Logically Consistent Chain-of-Thought Approach for Stance Detection. arXiv preprint arXiv:2312.16054.
  47. Investigating Chain-of-thought with ChatGPT for Stance Detection on Social Media. arXiv preprint arXiv:2304.03087.
  48. Can large language models transform computational social science? Computational Linguistics, 1–53.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets