Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Language Models with Distant Supervision to Identify Major Depressive Disorder from Clinical Notes (2104.09644v1)

Published 19 Apr 2021 in cs.CL, cs.AI, and cs.IR

Abstract: Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide. Phenotyping of MDD can help early diagnosis and consequently may have significant advantages in patient management. In prior research MDD phenotypes have been extracted from structured Electronic Health Records (EHR) or using Electroencephalographic (EEG) data with traditional machine learning models to predict MDD phenotypes. However, MDD phenotypic information is also documented in free-text EHR data, such as clinical notes. While clinical notes may provide more accurate phenotyping information, NLP algorithms must be developed to abstract such information. Recent advancements in NLP resulted in state-of-the-art neural LLMs, such as Bidirectional Encoder Representations for Transformers (BERT) model, which is a transformer-based model that can be pre-trained from a corpus of unsupervised text data and then fine-tuned on specific tasks. However, such neural LLMs have been underutilized in clinical NLP tasks due to the lack of large training datasets. In the literature, researchers have utilized the distant supervision paradigm to train machine learning models on clinical text classification tasks to mitigate the issue of lacking annotated training data. It is still unknown whether the paradigm is effective for neural LLMs. In this paper, we propose to leverage the neural LLMs in a distant supervision paradigm to identify MDD phenotypes from clinical notes. The experimental results indicate that our proposed approach is effective in identifying MDD phenotypes and that the Bio- Clinical BERT, a specific BERT model for clinical data, achieved the best performance in comparison with conventional machine learning models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Bhavani Singh Agnikula Kshatriya (2 papers)
  2. Nicolas A Nunez (1 paper)
  3. Manuel Gardea- Resendez (1 paper)
  4. Euijung Ryu (2 papers)
  5. Brandon J Coombes (1 paper)
  6. Sunyang Fu (9 papers)
  7. Mark A Frye (1 paper)
  8. Joanna M Biernacka (2 papers)
  9. Yanshan Wang (50 papers)
Citations (4)