Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse (2403.03336v1)
Abstract: In this paper, we develop an LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities. We formulate emerging opinion mining as a pairwise stance detection problem between (title, comment) pairs sourced from Reddit, where post titles contain emerging health-related claims on a topic that is not predefined. The claims are either explicitly or implicitly expressed by the user. We detail (i) a method of claim identification -- the task of identifying if a post title contains a claim and (ii) an opinion mining-driven evaluation framework for stance detection using LLMs. We facilitate our exploration by releasing a novel test dataset, Long COVID-Stance, or LC-stance, which can be used to evaluate LLMs on the tasks of claim identification and stance detection in online health communities. Long Covid is an emerging post-COVID disorder with uncertain and complex treatment guidelines, thus making it a suitable use case for our task. LC-Stance contains long COVID treatment related discourse sourced from a Reddit community. Our evaluation shows that GPT-4 significantly outperforms prior works on zero-shot stance detection. We then perform thorough LLM model diagnostics, identifying the role of claim type (i.e. implicit vs explicit claims) and comment length as sources of model error.
- Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 1998–2022.
- Fact checking: An automatic end to end fact checking system. Combating Fake News with Computational Intelligence Techniques, 345–366.
- Zero-shot stance detection: A dataset and model using generalized topic representations. arXiv preprint arXiv:2010.03640.
- User-Based Stance Analysis for Mitigating the Impact of Social Bots on Measuring Public Opinion with Stance Detection in Twitter. In International Conference on Social Informatics, 381–388. Springer.
- A benchmark dataset of check-worthy factual claims. In Proceedings of the International AAAI Conference on Web and Social Media, volume 14, 821–829.
- MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. arXiv preprint arXiv:1909.03242.
- Tweeteval: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2010.12421.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Large language models for text classification: From zero-shot learning to fine-tuning. Open Science Foundation.
- Social media use for health purposes: systematic review. Journal of medical Internet research, 23(5): e17917.
- Unmasking people’s opinions behind mask-wearing during COVID-19 pandemic—a Twitter stance analysis. Symmetry, 13(11): 1995.
- Use of Large Language Models for Stance Classification. arXiv preprint arXiv:2309.13734.
- The state of human-centered NLP technology for fact-checking. Information processing & management, 60(2): 103219.
- Multiple Evidence Combination for Fact-Checking of Health-Related Information. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 237–247.
- Chain-of-Thought Embeddings for Stance Detection on Social Media. In Findings of the Association for Computational Linguistics: EMNLP 2023, 4154–4161.
- Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity. arXiv preprint arXiv:2309.06541.
- Stance detection in COVID-19 tweets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Long Papers), volume 1.
- SemEval-2019 Task 7: RumourEval 2019: Determining Rumour Veracity and Support for Rumours. In Proceedings of the 13th International Workshop on Semantic Evaluation: NAACL HLT 2019, 845–854.
- Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1): 1–23.
- A Survey on Stance Detection for Mis-and Disinformation Identification. In Findings of the Association for Computational Linguistics: NAACL 2022, 1259–1277.
- Claimbuster: The first-ever end-to-end fact-checking system. Proceedings of the VLDB Endowment, 10(12): 1945–1948.
- DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION. In International Conference on Learning Representations.
- COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020.
- UnifEE: Unified Evidence Extraction for Fact Verification. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1142–1152.
- Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech. In Companion Proceedings of the ACM Web Conference 2023, 294–297.
- Application of data analytics for product design: Sentiment analysis of online product reviews. CIRP Journal of Manufacturing Science and Technology, 23: 128–144.
- Balanced and explainable social media analysis for public health with large language models. In Australasian Database Conference, 73–86. Springer.
- PoxVerifi: An Information Verification System to Combat Monkeypox Misinformation. arXiv preprint arXiv:2209.09300.
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871–7880.
- An end-to-end multi-task learning model for fact checking. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), 138–144.
- P-stance: A large dataset for stance detection in political domain. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2355–2365.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4): 1093–1113.
- Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), 31–41.
- Language-aware truth assessment of fact candidates. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1009–1019.
- Overview of the CLEF-2022 CheckThat! lab task 1 on identifying relevant claims in tweets. In 2022 Conference and Labs of the Evaluation Forum, CLEF 2022, 368–392. CEUR Workshop Proceedings (CEUR-WS. org).
- Tathya: A multi-classifier system for detecting check-worthy statements in political debates. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2259–2262.
- DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 22–32.
- Read, Diagnose and Chat: Towards Explainable and Interactive LLMs-Augmented Depression Detection in Social Media. arXiv preprint arXiv:2305.05138.
- Lessons from Natural Language Inference in the Clinical Domain. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 1586–1596.
- Claim Extraction and Dynamic Stance Detection in COVID-19 Tweets. In Companion Proceedings of the ACM Web Conference 2023, 1059–1068.
- Characterizing Information Seeking Events in Health-Related Social Discourse. arXiv preprint arXiv:2308.09156.
- Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data, 8(3): 171–188.
- Bert for evidence retrieval and claim verification. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42, 359–366. Springer.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
- Leveraging Large Language Models and Weak Supervision for Social Media data annotation: an evaluation using COVID-19 self-reported vaccination tweets. In International Conference on Human-Computer Interaction, 356–366. Springer.
- FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 809–819.
- Early detection of rumours on twitter via stance transfer learning. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I 42, 575–588. Springer.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Identification and verification of simple claims about statistical properties. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2596–2601.
- Document-Level Machine Translation with Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 16646–16661.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824–24837.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. arXiv:2307.14385.
- Yong, S. J. 2021. Long COVID or post-COVID-19 syndrome: putative pathophysiology, risk factors, and treatments. Infectious diseases, 53(10): 737–754.
- EZ-STANCE: A Large Dataset for Zero-Shot Stance Detection. In Findings of the Association for Computational Linguistics: EMNLP 2023, 897–911.
- Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145.