Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Listenable Maps for Audio Classifiers (2403.13086v3)

Published 19 Mar 2024 in cs.SD, cs.LG, eess.AS, and eess.SP

Abstract: Despite the impressive performance of deep learning models across diverse tasks, their complexity poses challenges for interpretation. This challenge is particularly evident for audio signals, where conveying interpretations becomes inherently difficult. To address this issue, we introduce Listenable Maps for Audio Classifiers (L-MAC), a posthoc interpretation method that generates faithful and listenable interpretations. L-MAC utilizes a decoder on top of a pretrained classifier to generate binary masks that highlight relevant portions of the input audio. We train the decoder with a loss function that maximizes the confidence of the classifier decision on the masked-in portion of the audio while minimizing the probability of model output for the masked-out portion. Quantitative evaluations on both in-domain and out-of-domain data demonstrate that L-MAC consistently produces more faithful interpretations than several gradient and masking-based methodologies. Furthermore, a user study confirms that, on average, users prefer the interpretations generated by the proposed technique.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Sanity checks for saliency maps, 2020.
  2. Audiomnist: Exploring explainable artificial intelligence for audio analysis on a simple benchmark, 2023.
  3. Evaluating and aggregating feature-based model explanations, 2020.
  4. Concise explanations of neural networks using adversarial training, 2020.
  5. Explaining image classifiers by counterfactual generation, 2019.
  6. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, March 2018.
  7. Vggsound: A large-scale audio-visual dataset. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020a.
  8. A simple framework for contrastive learning of visual representations, 2020b.
  9. Tracing back music emotion predictions to sound sources and intuitive perceptual qualities, 2021.
  10. Real time image saliency for black box classifiers, 2017.
  11. Fan, L. Adversarial localization network. 2017. URL https://api.semanticscholar.org/CorpusID:52087996.
  12. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks, 2018.
  13. Interpretable explanations of black boxes by meaningful perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, October 2017. doi: 10.1109/iccv.2017.371. URL http://dx.doi.org/10.1109/ICCV.2017.371.
  14. audiolime: Listenable explanations using source separation, 2020.
  15. Quantus: An explainable ai toolkit for responsible evaluation of neural network explanations and beyond. Journal of Machine Learning Research, 24(34):1–11, 2023. URL http://jmlr.org/papers/v24/22-0142.html.
  16. A benchmark for interpretability methods in deep neural networks, 2019.
  17. Captum: A unified and generic model interpretability library for pytorch, 2020.
  18. Panns: Large-scale pretrained audio neural networks for audio pattern recognition, 2020.
  19. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999. URL https://api.semanticscholar.org/CorpusID:4428232.
  20. A unified approach to interpreting model predictions, 2017.
  21. Local interpretable model-agnostic explanations for music content analysis. In International Society for Music Information Retrieval Conference, 2017. URL https://api.semanticscholar.org/CorpusID:795766.
  22. Reliable local explanations for machine listening, 2020.
  23. Molnar, C. Interpretable Machine Learning. 2 edition, 2022. URL https://christophm.github.io/interpretable-ml-book.
  24. Understanding and Visualizing Raw Waveform-Based CNNs. In Proc. Interspeech 2019, pp.  2345–2349, 2019. doi: 10.21437/Interspeech.2019-2341.
  25. Posthoc interpretation via quantization, 2023.
  26. Listen to interpret: Post-hoc interpretability for audio networks with nmf. In Advances in Neural Information Processing Systems, volume 35, pp.  35270–35283, 2022.
  27. Rise: Randomized input sampling for explanation of black-box models, 2018.
  28. Investigating and simplifying masking-based saliency methods for model interpretability, 2020.
  29. Piczak, K. J. ESC: Dataset for Environmental Sound Classification. In Proceedings of the 23rd Annual ACM Conference on Multimedia, pp.  1015–1018. ACM Press. ISBN 978-1-4503-3459-4. doi: 10.1145/2733373.2806390. URL http://dl.acm.org/citation.cfm?doid=2733373.2806390.
  30. Speechbrain: A general-purpose speech toolkit, 2021.
  31. ”why should i trust you?”: Explaining the predictions of any classifier, 2016.
  32. Schoeffler, M. and et al. webmushra — a comprehensive framework for web-based listening tests. 2018.
  33. Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, (2):336–359, October 2019.
  34. Deep inside convolutional networks: Visualising image classification models and saliency maps, 2014.
  35. Smoothgrad: removing noise by adding noise, 2017.
  36. Striving for simplicity: The all convolutional net, 2015.
  37. Axiomatic attribution for deep networks, 2017.
  38. Toward interpretable music tagging with self-attention. URL http://arxiv.org/abs/1906.04972.
  39. Opti-cam: Optimizing saliency maps for interpretability, 2023.
  40. Classifier-agnostic saliency map extraction. Computer Vision and Image Understanding, 196:102969, July 2020. ISSN 1077-3142. doi: 10.1016/j.cviu.2020.102969. URL http://dx.doi.org/10.1016/j.cviu.2020.102969.
Citations (6)

Summary

We haven't generated a summary for this paper yet.