Characterizing Information Seeking Events in Health-Related Social Discourse (2308.09156v2)
Abstract: Social media sites have become a popular platform for individuals to seek and share health information. Despite the progress in natural language processing for social media mining, a gap remains in analyzing health-related texts on social discourse in the context of events. Event-driven analysis can offer insights into different facets of healthcare at an individual and collective level, including treatment options, misconceptions, knowledge gaps, etc. This paper presents a paradigm to characterize health-related information-seeking in social discourse through the lens of events. Events here are board categories defined with domain experts that capture the trajectory of the treatment/medication. To illustrate the value of this approach, we analyze Reddit posts regarding medications for Opioid Use Disorder (OUD), a critical global health concern. To the best of our knowledge, this is the first attempt to define event categories for characterizing information-seeking in OUD social discourse. Guided by domain experts, we develop TREAT-ISE, a novel multilabel treatment information-seeking event dataset to analyze online discourse on an event-based framework. This dataset contains Reddit posts on information-seeking events related to recovery from OUD, where each post is annotated based on the type of events. We also establish a strong performance benchmark (77.4% F1 score) for the task by employing several machine learning and deep learning classifiers. Finally, we thoroughly investigate the performance and errors of ChatGPT on this task, providing valuable insights into the LLM's capabilities and ongoing characterization efforts.
- The Pursuit of Peer Support for Opioid Use Recovery on Reddit. Proceedings of the International AAAI Conference on Web and Social Media, 17(1): 12–23.
- The Pushshift Reddit Dataset. arXiv:2001.08435.
- Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications, 203: 117215.
- How to use and assess qualitative research methods. Neurological Research and practice, 2: 1–10.
- Who is the ”Human” in Human-Centered Machine Learning: The Case of Predicting Mental Health from Social Media. Proc. ACM Hum.-Comput. Interact., 3(CSCW).
- Discovering Alternative Treatments for Opioid Use Recovery Using Social Media. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, 1–15. New York, NY, USA: Association for Computing Machinery. ISBN 9781450359702.
- Examining stigma relating to substance use and contextual factors in social media discussions. Drug and Alcohol Dependence Reports, 3: 100061.
- Social media use for health purposes: systematic review. Journal of medical Internet research, 23(5): e17917.
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv:2003.10555.
- Cohen, J. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1): 37–46.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.
- “You’re not supposed to be on it forever”: medications to treat opioid use disorder (MOUD) related stigma among drug treatment providers and people who use opioids. Substance Abuse: Research and Treatment, 16: 11782218221103859.
- A scoping review of the use of Twitter for public health research. Computers in Biology and Medicine, 122: 103770.
- The economic burden of opioid use disorder and fatal opioid overdose in the United States, 2017. Drug and Alcohol Dependence, 218: 108350.
- Methods for Analyzing the Contents of Social Media for Health Care: Scoping Review. J Med Internet Res, 25: e43349.
- Scope of Pre-trained Language Models for Detecting Conflicting Health Information. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, 221–232.
- ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30).
- Bag of Tricks for Efficient Text Classification. arXiv:1607.01759.
- Social Media Role and Its Impact on Public Health: A Narrative Review. Cureus, 15(1).
- Monitoring the opioid epidemic via social media discussions. medRxiv.
- Evaluating ChatGPT’s Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness. arXiv:2304.11633.
- Document-Level Event Argument Extraction by Conditional Generation. arXiv:2104.05919.
- Event Detection without Triggers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 735–744. Minneapolis, Minnesota.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.
- Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2795–2806. Online.
- DICE: Data-Efficient Clinical Event Extraction with Generative Models. arXiv:2208.07989.
- Maiya, A. S. 2020. ktrain: A Low-Code Library for Augmented Machine Learning. arXiv preprint arXiv:2004.10703.
- McNemar, Q. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2): 153–157.
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 11048–11064. Abu Dhabi, United Arab Emirates.
- What is the Real Intention behind this Question? Dataset Collection and Intention Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 13606–13622. Toronto, Canada: Association for Computational Linguistics.
- Patient decision aid for medication treatment for opioid use disorder (PtDA-MOUD): Rationale, methodology, and preliminary results. Journal of Substance Abuse Treatment, 108: 115–122.
- Extracting Personal Medical Events for User Timeline Construction using Minimal Supervision. In BioNLP 2017, 356–364. Vancouver, Canada,.
- Health information seeking behaviors on social media during the COVID-19 pandemic among American social networking site users: Survey study. J. Med. Internet Res., 23(6): e29802.
- Confronting the Stigma of Opioid Use Disorder—and Its Treatment. JAMA, 311(14): 1393–1394.
- Training language models to follow instructions with human feedback. arXiv:2203.02155.
- Fighting an Infodemic: COVID-19 Fake News Dataset. In Combating Online Hostile Posts in Regional Languages during Emergency Situation, 21–29.
- Theme-driven Keyphrase Extraction to Analyze Social Media Discourse. Proceedings of the International AAAI Conference on Web and Social Media.
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108.
- Engagement Patterns of Peer-to-Peer Interactions on Mental Health Platforms. arXiv:2004.04999.
- Using Social Media for Mental Health Surveillance: A Review. ACM Comput. Surv., 53(6).
- MPNet: Masked and Permuted Pre-training for Language Understanding. arXiv:2004.09297.
- Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations. JAMA Internal Medicine.
- Robust Logistic Regression using Shift Parameters. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 124–129. Baltimore, Maryland.
- Challenges for Toxic Comment Classification: An In-Depth Error Analysis. arXiv:1809.07572.
- Attention is All you Need. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30.
- Baselines and Bigrams: Simple, Good Sentiment and Topic Classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 90–94.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems, volume 35, 24824–24837.
- Understanding Participant Behavior Trajectories in Online Health Support Groups Using Automatic Extraction Methods. In Proceedings of the 2012 ACM International Conference on Supporting Group Work, GROUP ’12, 179–188. New York, NY, USA: Association for Computing Machinery. ISBN 9781450314862.
- Extracting Events with Informal Temporal References in Personal Histories in Online Communities. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 836–842. Sofia, Bulgaria.
- XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv:1906.08237.
- Omar Sharif (21 papers)
- Madhusudan Basak (11 papers)
- Tanzia Parvin (1 paper)
- Ava Scharfstein (1 paper)
- Alphonso Bradham (1 paper)
- Jacob T. Borodovsky (2 papers)
- Sarah E. Lord (2 papers)
- Sarah M. Preum (15 papers)