ALBA: Adaptive Language-based Assessments for Mental Health (2311.06467v2)
Abstract: Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments have shown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introduces the task of Adaptive Language-Based Assessment ALBA, which involves adaptively ordering questions while also scoring an individual's latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory. We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-based method ALIRT and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ~ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.
- Radwan E Abdel-Aal and El-Sayed M El-Alfy. 2009. Constructing optimal educational tests using gmdh-based item ranking and selection. Neurocomputing, 72(4-6):1184–1197.
- Comparative study of lsa vs word2vec embeddings in small corpora: a case study in dreams database. arXiv preprint arXiv:1610.01520.
- Diagnostic and statistical manual of mental disorders: DSM-5, volume 5. American psychiatric association Washington, DC.
- Ethical research protocols for social media health research. In Proceedings of the first ACL workshop on ethics in natural language processing, pages 94–102.
- Jason Catlett. 1991. On changing continuous attributes into ordered discrete attributes. In Machine Learning—EWSL-91: European Working Session on Learning Porto, Portugal, March 6–8, 1991 Proceedings 5, pages 164–178. Springer.
- R Philip Chalmers. 2012. mirt: A multidimensional item response theory package for the r environment. Journal of statistical Software, 48:1–29.
- R Philip Chalmers. 2016. Generating adaptive and non-adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71:1–38.
- Stevie Chancellor and Munmun De Choudhury. 2020. Methods in predictive techniques for mental health status on social media: a critical review. NPJ digital medicine, 3(1):43.
- Onder Coban. 2022a. Irtext: An item response theory-based approach for text categorization. Arabian Journal for Science and Engineering, 47(8):9423–9439.
- Onder Coban. 2022b. A new modification and application of item response theory-based feature selection for different machine learning tasks. Concurrency and Computation: Practice and Experience, 34(26):e7282.
- Clpsych 2015 shared task: Depression and ptsd on twitter. In Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pages 31–39.
- Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems, pages 2098–2110.
- Charles D Dziuban and Edwin C Shirkey. 1974. When is a correlation matrix appropriate for factor analysis? some decision rules. Psychological bulletin, 81(6):358.
- Closed-and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations. Psychological Methods, 26(4):398.
- Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences, 115(44):11203–11208.
- Empirical evaluation of pre-trained transformers for human-level nlp: the role of sample size and dimensionality. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting, volume 2021, page 4515. NIH Public Access.
- Richard L Gorsuch. 1973. Using bartlett’s significance test to determine the number of factors to extract. Educational and Psychological Measurement, 33(2):361–364.
- A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307.
- Anders Hald. 1999. On the history of maximum likelihood in relation to inverse probability and least squares. Statistical Science, 14(2):214–222.
- Educational and psychological measurement. American Psychological Association, 34:111–7.
- Gaining insights from social media language: Methodologies and challenges. Psychological methods, 21(4):507.
- Anxiety predicts mortality in icd patients: results from the cross-sectional national copenhearticd survey with register follow-up. Pacing and Clinical Electrophysiology, 37(12):1641–1650.
- Semantic measures: Using natural language processing to measure, differentiate, and describe psychological constructs. Psychological Methods, 24(1):92.
- Natural language analyzed with ai-based transformers predict traditional subjective well-being measures approaching the theoretical upper limits in accuracy. Scientific reports, 12(1):3918.
- Novel feature selection for artificial intelligence using item response theory for mortality prediction. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 5729–5732. IEEE.
- The phq-9: validity of a brief depression severity measure. Journal of general internal medicine, 16(9):606–613.
- Building an evaluation scale using item response theory. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, volume 2016, page 648. NIH Public Access.
- Dong C Liu and Jorge Nocedal. 1989. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1-3):503–528.
- Frederic M Lord. 2012. Applications of item response theory to practical testing problems. Routledge.
- Frederic M Lord and Melvin R Novick. 2008. Statistical theories of mental test scores. IAP.
- Clpsych 2016 shared task: Triaging content in online peer-support forums. In Proceedings of the third workshop on computational linguistics and clinical psychology, pages 118–127.
- Eiji Muraki and Mari Muraki. 2016. Generalized partial credit model. In Handbook of item response theory, pages 155–166. Chapman and Hall/CRC.
- Remo Ostini and Michael L Nering. 2006. Polytomous item response theory models. 144. Sage.
- Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers & Education, 137:91–103.
- Steven P Reise and Niels G Waller. 2009. Item response theory and clinical measurement. Annual review of clinical psychology, 5:27–48.
- Fumiko Samejima. 2016. Graded response models. In Handbook of item response theory, pages 123–136. Chapman and Hall/CRC.
- Towards assessing changes in degree of depression through facebook. In Proceedings of the workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pages 118–125.
- João Sedoc and Lyle Ungar. 2020. Item response theory for efficient human evaluation of chatbots. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pages 21–33, Online. Association for Computational Linguistics.
- Precise language responses versus easy rating scales—comparing respondents’ views with clinicians’ belief of the respondent’s views. Plos one, 18(2):e0267995.
- Mood disorder questionnaire–characteristic and indications. Psychiatria Polska, 43(3):287–299.
- A brief measure for assessing generalized anxiety disorder: the gad-7. Archives of internal medicine, 166(10):1092–1097.
- Yoshio Takane and Jan De Leeuw. 1987. On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3):393–408.
- Development of computerized adaptive testing for emotion regulation. Frontiers in Psychology, 11:561358.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.