Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing (2406.13385v1)

Published 19 Jun 2024 in eess.AS, cs.AI, and cs.SD

Abstract: Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy "good" properties, such as informativeness, compactness, or modularity, to be interpretable. In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new perspectives toward the evaluation of interpretable representations according to "good" properties.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Martin Lebourdais (4 papers)
  2. Théo Mariotte (10 papers)
  3. Antonio Almudévar (8 papers)
  4. Marie Tahon (13 papers)
  5. Alfonso Ortega (24 papers)

Summary

We haven't generated a summary for this paper yet.