Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection (2309.13476v2)

Published 23 Sep 2023 in cs.CL, cs.SD, and eess.AS

Abstract: Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.897 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. World Health Organization, “Depressive disorder (depression),” https://www.who.int/news-room/fact-sheets/detail/depression, 2023, Accessed: August 15, 2023.
  2. “Audio based depression detection using Convolutional Autoencoder,” Expert Systems with Applications, vol. 189, pp. 116076, Mar. 2022.
  3. “End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis,” Computer Methods and Programs in Biomedicine, vol. 211, pp. 106433, Nov. 2021.
  4. “An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder,” July 2022, arXiv:2207.00753 [cs].
  5. “Deep learning and machine learning in psychiatry: a survey of current progress in depression detection, diagnosis and treatment,” Brain Informatics, vol. 10, no. 1, pp. 10, Apr. 2023.
  6. Cynthia Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, May 2019, Number: 5 Publisher: Nature Publishing Group.
  7. “Attention is not not Explanation,” Sept. 2019, arXiv:1908.04626 [cs].
  8. “Explainable depression detection with multi-aspect features using a hybrid deep learning model on social media,” World Wide Web, vol. 25, no. 1, pp. 281–304, Jan. 2022.
  9. “Generic Attention-Model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers,” 2021, pp. 397–406.
  10. “D-vlog: Multimodal Vlog Dataset for Depression Detection,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 11, pp. 12226–12234, June 2022, Number: 11.
  11. “Robust Speech Recognition via Large-Scale Weak Supervision,” in Proceedings of the 40th International Conference on Machine Learning. July 2023, pp. 28492–28518, PMLR, ISSN: 2640-3498.
  12. “Attention is All you Need,” in Advances in Neural Information Processing Systems. 2017, vol. 30, Curran Associates, Inc.
  13. “AST: Audio Spectrogram Transformer,” July 2021, arXiv:2104.01778 [cs].
  14. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” May 2019, arXiv:1810.04805 [cs].
  15. “Multimodal Transformer for Unaligned Multimodal Language Sequences,” Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, pp. 6558–6569, July 2019.
  16. “A review of depression and suicide risk assessment using speech analysis,” Speech Communication, vol. 71, pp. 10–49, July 2015.

Summary

We haven't generated a summary for this paper yet.