Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tackling Interpretability in Audio Classification Networks with Non-negative Matrix Factorization (2305.07132v1)

Published 11 May 2023 in cs.SD, cs.LG, and eess.AS

Abstract: This paper tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, an interpreter is trained to generate a regularized intermediate embedding from hidden layers of a target network, learnt as time-activations of a pre-learnt NMF dictionary. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on a variety of classification tasks, including multi-label data for real-world audio and music.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jayneel Parekh (9 papers)
  2. Sanjeel Parekh (9 papers)
  3. Pavlo Mozharovskyi (37 papers)
  4. Gaël Richard (46 papers)
  5. Florence d'Alché-Buc (34 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.