Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Timbre Difference Capturing in Anomalous Sound Detection (2410.22033v1)

Published 29 Oct 2024 in eess.AS and cs.SD

Abstract: This paper proposes a framework of explaining anomalous machine sounds in the context of anomalous sound detection~(ASD). While ASD has been extensively explored, identifying how anomalous sounds differ from normal sounds is also beneficial for machine condition monitoring. However, existing sound difference captioning methods require anomalous sounds for training, which is impractical in typical machine condition monitoring settings where such sounds are unavailable. To solve this issue, we propose a new strategy for explaining anomalous differences that does not require anomalous sounds for training. Specifically, we introduce a framework that explains differences in predefined timbre attributes instead of using free-form text captions. Objective metrics of timbre attributes can be computed using timbral models developed through psycho-acoustical research, enabling the estimation of how and what timbre attributes have changed from normal sounds without training machine learning models. Additionally, to accurately determine timbre differences regardless of variations in normal training data, we developed a method that jointly conducts anomalous sound detection and timbre difference estimation based on a k-nearest neighbors method in an audio embedding space. Evaluation using the MIMII DG dataset demonstrated the effectiveness of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Y. Koizumi, Y. Kawaguchi, K. Imoto, T. Nakamura, Y. Nikaido, R. Tanabe, H. Purohit, K. Suefusa, T. Endo, M. Yasuda, and N. Harada, “Description and discussion on DCASE2020 Challenge Task2: Unsupervised anomalous sound detection for machine condition monitoring,” in Proc. Workshop Detect. Class. Acoust. Scenes Events (DCASE Workshop), 2020, pp. 81–85.
  2. K. Wilkinghoff and F. Kurth, “Why do angular margin losses work well for semi-supervised anomalous sound detection?” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 32, pp. 608–622, 2024.
  3. Y. Koizumi, S. Saito, H. Uematsu, Y. Kawachi, and N. Harada, “Unsupervised detection of anomalous sound based on deep learning and the Neyman-Pearson Lemma,” IEEE Trans. Audio, Speech, Lang. Process., vol. 27, no. 1, pp. 212–224, 2019.
  4. K. Suefusa, T. Nishida, H. Purohit, R. Tanabe, T. Endo, and Y. Kawaguchi, “Anomalous sound detection based on interpolation deep neural network,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2020, pp. 271–275.
  5. R. Giri, S. V. Tenneti, K. Helwani, F. Cheng, U. Isik, and A. Krishnaswamy, “Unsupervised anomalous sound detection using self-supervised classification and group masked autoencoder for density estimation,” Challenge Detect. Class. Acoust. Scenes Events (DCASE Challenge), Tech. Rep., 2020.
  6. K. Dohi, T. Endo, H. Purohit, R. Tanabe, and Y. Kawaguchi, “Flow-based self-supervised density estimation for anomalous sound detection,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2021, pp. 336–340.
  7. J. Lopez, G. Stemmer, and P. Lopez-Meyer, “Ensemble of complementary anomaly detectors under domain shifted conditions,” Challenge Detect. Class. Acoust. Scenes Events (DCASE Challenge), Tech. Rep., 2021.
  8. K. Wilkinghoff, “Self-supervised learning for anomalous sound detection,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 276–280.
  9. Z. Lv, A. Jiang, B. Han, Y. Liang, Y. Qian, X. Chen, J. Liu, and P. Fan, “Aithu system for first-shot unsupervised anomalous sound detection,” Challenge Detect. Class. Acoust. Scenes Events (DCASE Challenge), Tech. Rep., June 2024.
  10. D. Takeuchi, Y. Ohishi, D. Niizumi, N. Harada, and K. Kashino, “Audio difference captioning utilizing similarity-discrepancy disentanglement,” in Proc. Workshop Detect. Class. Acoust. Scenes Events (DCASE Workshop), September 2023, pp. 181–185.
  11. K. Jensen, “The timbre model,” J. Acoust. Soc. Am., vol. 112, no. 5, pp. 2238–2238, 2002.
  12. A. Pearce, T. Brookes, and R. Mason, “Timbral attributes for sound effect library searching,” in J. Audio Eng. Soc., 2017.
  13. T. Mian, A. Choudhary, and S. Fatima, “An efficient diagnosis approach for bearing faults using sound quality metrics,” Applied Acoustics, vol. 195, p. 108839, 2022.
  14. Y. Ota and M. Unoki, “Anomalous sound detection for industrial machines using acoustical features related to timbral metrics,” IEEE Access, vol. 11, pp. 70 884–70 897, 2023.
  15. Y. Kawaguchi, K. Imoto, Y. Koizumi, N. Harada, D. Niizumi, K. Dohi, R. Tanabe, H. Purohit, and T. Endo, “Description and discussion on DCASE 2021 Challenge Task 2: Unsupervised anomalous detection for machine condition monitoring under domain shifted conditions,” in Proc. Workshop Detect. Class. Acoust. Scenes Events (DCASE Workshop), November 2021, pp. 186–190.
  16. K. Dohi, K. Imoto, N. Harada, D. Niizumi, Y. Koizumi, T. Nishida, H. Purohit, R. Tanabe, T. Endo, M. Yamamoto, and Y. Kawaguchi, “Description and discussion on DCASE 2022 Challenge Task 2: Unsupervised anomalous sound detection for machine condition monitoring applying domain generalization techniques,” in Proc. Workshop Detect. Class. Acoust. Scenes Events (DCASE Workshop), November 2022.
  17. K. Dohi, K. Imoto, N. Harada, D. Niizumi, Y. Koizumi, T. Nishida, H. Purohit, R. Tanabe, T. Endo, and Y. Kawaguchi, “Description and discussion on DCASE 2023 Challenge Task 2: First-Shot unsupervised anomalous sound detection for machine condition monitoring,” in Proc. Workshop Detect. Class. Acoust. Scenes Events (DCASE Workshop), September 2023, pp. 31–35.
  18. S. Tsubaki, Y. Kawaguchi, T. Nishida, K. Imoto, Y. Okamoto, K. Dohi, and T. Endo, “Audio-change captioning to explain machine-sound anomalies,” in Proc. Workshop Detect. Class. Acoust. Scenes Events (DCASE Workshop), September 2023, pp. 201–205.
  19. K. Minemura, T. Ogawa, and T. Kobayashi, “Acoustic feature representation based on timbre for fault detection of rotary machines,” in 2018 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC).   IEEE, 2018, pp. 302–305.
  20. A. Jiang, X. Zheng, Y. Qiu, W. Zhang, B. Chen, P. Fan, W.-Q. Zhang, C. Lu, and J. Liu, “Thuee system for first-shot unsupervised anomalous sound detection,” Challenge Detect. Class. Acoust. Scenes Events (DCASE Challenge), Tech. Rep., June 2024.
  21. S. J. Mason and N. E. Graham, “Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: Statistical significance and interpretation,” Q. J. R. Meteorol. Soc., vol. 128, no. 584, pp. 2145–2166, 2002.
  22. K. Dohi, T. Nishida, H. Purohit, R. Tanabe, T. Endo, M. Yamamoto, Y. Nikaido, and Y. Kawaguchi, “MIMII DG: Sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task,” in Proc. Workshop Detect. Class. Acoust. Scenes Events (DCASE Workshop), 2022.
  23. T. Nishida, N. Harada, D. Niizumi, D. Albertini, R. Sannino, S. Pradolini, F. Augusti, K. Imoto, K. Dohi, H. Purohit, T. Endo, and Y. Kawaguchi, “Description and discussion on dcase 2024 challenge task 2: First-shot unsupervised anomalous sound detection for machine condition monitoring,” 2024. [Online]. Available: https://arxiv.org/abs/2406.07250
  24. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vision Pattern Recog. (CVPR), 2018, pp. 4510–4520.
  25. Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang, and M. D. Plumbley, “PANNs: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2880–2894, 2020.
  26. B. Elizalde, S. Deshmukh, and H. Wang, “Natural language supervision for general-purpose audio representations,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 336–340.
  27. S. Chen, Y. Wu, C. Wang, S. Liu, D. Tompkins, Z. Chen, W. Che, X. Yu, and F. Wei, “BEATs: Audio pre-training with acoustic tokenizers,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 202, July 2023, pp. 5178–5193.
  28. S. Baccianella, A. Esuli, and F. Sebastiani, “Evaluation measures for ordinal regression,” in Proc. Int. Conf. Intel. System. Des. and Appl.   IEEE, 2009, pp. 283–287.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.