A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions (2407.16593v1)
Abstract: There exists an invisible barrier between healthcare professionals' perception of a patient's clinical experience and the reality. This barrier may be induced by the environment that hinders patients from sharing their experiences openly with healthcare professionals. As patients are observed to discuss and exchange knowledge more candidly on social media, valuable insights can be leveraged from these platforms. However, the abundance of non-patient posts on social media necessitates filtering out such irrelevant content to distinguish the genuine voices of patients, a task we refer to as patient voice classification. In this study, we analyse the importance of linguistic characteristics in accurately classifying patient voices. Our findings underscore the essential role of linguistic and statistical text similarity analysis in identifying common patterns among patient groups. These results allude to even starker differences in the way patients express themselves at a disease level and across various therapeutic domains. Additionally, we fine-tuned a pre-trained LLM on the combined datasets with similar linguistic patterns, resulting in a highly accurate automatic patient voice classification. Being the pioneering study on the topic, our focus on extracting authentic patient experiences from social media stands as a crucial step towards advancing healthcare standards and fostering a patient-centric approach.
- Grissinger, M. The five rights: a destination without a map. \JournalTitlePharmacy and Therapeutics 35, 542 (2010).
- Eland et al. Attitudinal survey of voluntary reporting of adverse drug reactions. \JournalTitleBritish journal of clinical pharmacology 48, 623–627 (1999).
- Who talks? the social psychology of illness support groups. \JournalTitleAmerican Psychologist 55, 205 (2000).
- Leonard, P. Exploring ways to manage healthcare professional—patient communication issues. \JournalTitleSupportive Care in Cancer 25, 7–9 (2017).
- Talking about your health to strangers: understanding the use of online social networks by patients. \JournalTitleNew review of hypermedia and multimedia 16, 141–160 (2010).
- Patients’ and health professionals’ use of social media in health care: motives, barriers and expectations. \JournalTitlePatient education and counseling 92, 426–431 (2013).
- Using social media as a source of real-world data for pharmaceutical drug development and regulatory decision making. \JournalTitleDrug Safety 1–17 (2024).
- Patient reporting of suspected adverse drug reactions: a review of published literature and international experience. \JournalTitleBritish journal of clinical pharmacology 63, 148–156 (2007).
- van Uden-Kraan, C. F. et al. Coping with somatic illnesses in online support groups: do the feared disadvantages actually occur? \JournalTitleComputers in human behavior 24, 309–324 (2008).
- Classifying patient and professional voice in social media health posts. \JournalTitleBMC medical informatics and decision making 21, 1–10 (2021).
- Lee, J. et al. Biobert: a pre-trained biomedical language representation model for biomedical text mining. \JournalTitleBioinformatics 36, 1234–1240 (2020).
- Alsentzer, E. et al. Publicly available clinical BERT embeddings. In Rumshisky, A., Roberts, K., Bethard, S. & Naumann, T. (eds.) Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78, DOI: 10.18653/v1/W19-1909 (Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019).
- Clinical-t5: Large language models built using mimic clinical text. \JournalTitlePhysioNet (2023).
- Parameter-efficient fine-tuning of llama for the clinical domain. \JournalTitlearXiv preprint arXiv:2307.03042 (2023).
- Hudaa, S. et al. Natural language processing utilization in healthcare. \JournalTitleInternational Journal of Engineering and Advanced Technology (2019).
- Deep learning techniques on text classification using natural language processing (nlp) in social healthcare network: A comprehensive survey. In 2021 3rd International Conference on Signal Processing and Communication (ICPSC), 603–609 (IEEE, 2021).
- Medical persona classification in social media. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 377–384 (2017).
- Distributed representations of words and phrases and their compositionality. \JournalTitleAdvances in neural information processing systems 26 (2013).
- From social media to public health surveillance: Word embedding based clustering method for twitter classification. In SoutheastCon 2017, 1–7 (IEEE, 2017).
- Construction of a personal experience tweet corpus for health surveillance. In Proceedings of the 15th workshop on biomedical natural language processing, 128–135 (2016).
- Using twitter to examine web-based patient experience sentiments in the united states: longitudinal study. \JournalTitleJournal of medical Internet research 20, e10043 (2018).
- Identifying personal experience tweets of medication effects using pre-trained roberta language model and its updating. In Proceedings of the 11th international workshop on health text mining and information analysis, 127–137 (2020).
- Lu, X. et al. User perceptions of different electronic cigarette flavors on social media: observational study. \JournalTitleJournal of medical Internet research 22, e17280 (2020).
- Natural language processing of reddit data to evaluate dermatology patient experiences and therapeutics. \JournalTitleJournal of the American Academy of Dermatology 83, 803–808 (2020).
- Meeking, K. Patients’ experiences of radiotherapy: Insights from twitter. \JournalTitleRadiography 26, e146–e151 (2020).
- Health-related hot topic detection in online communities using text clustering. \JournalTitlePlos one 8, e56221 (2013).
- A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data. \JournalTitleInternational journal of medical informatics 125, 37–46 (2019).
- Extracting a topic specific dataset from a twitter archive. In International Conference on Theory and Practice of Digital Libraries, 364–367 (Springer, 2015).
- Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. \JournalTitleJournal of documentation 28, 11–21 (1972).
- spaCy: Industrial-strength Natural Language Processing in Python. \JournalTitleSpacy Journal DOI: 10.5281/zenodo.1212303 (2020).
- Liu, Y. et al. Roberta: A robustly optimized bert pretraining approach. \JournalTitlearXiv preprint arXiv:1907.11692 (2019).
- Dietterich, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. \JournalTitleNeural computation 10, 1895–1923 (1998).
- Honnibal, M. Embed, encode, attend, predict: The new deep learning formula for state-of-the-art nlp models. In Explosion (2016). Published on 10/11/2016.
- Giorgos Lysandrou (3 papers)
- Roma English Owen (3 papers)
- Vanja Popovic (2 papers)
- Grant Le Brun (3 papers)
- Aryo Pradipta Gema (18 papers)
- Beatrice Alex (21 papers)
- Elizabeth A. L. Fairley (3 papers)