2000 character limit reached
Detecting anxiety from short clips of free-form speech (2312.15272v1)
Published 23 Dec 2023 in cs.CL, cs.CY, cs.LG, cs.SD, and eess.AS
Abstract: Barriers to accessing mental health assessments including cost and stigma continues to be an impediment in mental health diagnosis and treatment. Machine learning approaches based on speech samples could help in this direction. In this work, we develop machine learning solutions to diagnose anxiety disorders from audio journals of patients. We work on a novel anxiety dataset (provided through collaboration with Kintsugi Mindful Wellness Inc.) and experiment with several models of varying complexity utilizing audio, text and a combination of multiple modalities. We show that the multi-modal and audio embeddings based approaches achieve good performance in the task achieving an AUC ROC score of 0.68-0.69.
- Detecting depression with audio/text sequence modeling of interviews. In Interspeech, pages 1716–1720.
- vq-wav2vec: Self-supervised learning of discrete speech representations. arXiv preprint arXiv:1910.05453.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477.
- A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71:10–49.
- The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE transactions on affective computing, 7(2):190–202.
- Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia, pages 1459–1462.
- Acoustic properties of dominance and request utterances in social anxiety. Journal of social and clinical psychology, 32(6):651–673.
- Being “in” or “out” of the game: subjective and acoustic reactions to exclusion and popularity in social anxiety. Frontiers in human neuroscience, 8:147.
- On the relative importance of vocal source, system, and prosody in human depression. In 2013 IEEE International Conference on Body Sensor Networks, pages 1–6. IEEE.
- Classification for everyone: Building geography agnostic models for fairer recognition. arXiv preprint arXiv:2312.02957.
- SDGV Akanksha Kumari and Shreya Singh. 2017. Parallelization of alphabeta pruning algorithm for enhancing the two player games. Int. J. Advances Electronics Comput. Sci, 4:74–81.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope Investigative Otolaryngology, 5(1):96–116.
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc.
- Shop your right size: A system for recommending sizes for fashion products. In Companion Proceedings of The 2019 World Wide Web Conference, WWW ’19, page 327–334, New York, NY, USA. Association for Computing Machinery.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
- Voice traces of anxiety: acoustic parameters affected by anxiety disorder. Archives of Acoustics, pages 625–636.
- Thomas F Quatieri and Nicolas Malyska. 2012. Vocal-source biomarkers for depression: A link to psychomotor activity. In Thirteenth Annual Conference of the International Speech Communication Association.
- Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
- Shaping political discourse using multi-source news summarization. arXiv preprint arXiv:2312.11703.
- Exploring graph based approaches for author name disambiguation. arXiv preprint arXiv:2312.08388.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- A weakly supervised learning framework for detecting social anxiety and depression. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 2(2):1–26.
- wav2vec: Unsupervised pre-training for speech recognition. CoRR, abs/1904.05862.
- Multimodal group activity state detection for classroom response system using convolutional neural networks. In Recent Findings in Intelligent Computing Techniques, pages 245–251, Singapore. Springer Singapore.
- Do social anxiety individuals hesitate more? the prosodic profile of hesitation disfluencies in social anxiety disorder individuals. Speech Prosody 2016, pages 1211–1215.
- One embedding to do them all. arXiv preprint arXiv:1906.12120.
- Footwear size recommendation system. arXiv preprint arXiv:1806.11423.
- Jointly fine-tuning” bert-like” self supervised models to improve multimodal speech emotion recognition. arXiv preprint arXiv:2008.06682.
- A brief measure for assessing generalized anxiety disorder: the gad-7. Archives of internal medicine, 166(10):1092–1097.
- “the sound of fear”: Assessing vocal fundamental frequency as a physiological indicator of social anxiety disorder. Journal of anxiety disorders, 26(8):811–822.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Prabhat Agarwal (9 papers)
- Akshat Jindal (3 papers)
- Shreya Singh (18 papers)