A Novel Score-CAM based Denoiser for Spectrographic Signature Extraction without Ground Truth

Published 28 Oct 2024 in cs.SD, cs.LG, and eess.AS | (2410.21557v2)

Abstract: Sonar based audio classification techniques are a growing area of research in the field of underwater acoustics. Usually, underwater noise picked up by passive sonar transducers contains all types of signals that travel through the ocean and is transformed into spectrographic images. As a result, the corresponding spectrograms intended to display the temporal-frequency data of a certain object often include the tonal regions of abundant extraneous noise that can effectively interfere with a 'contact'. So, a majority of spectrographic samples extracted from underwater audio signals are rendered unusable due to their clutter and lack the required indistinguishability between different objects. With limited clean true data for supervised training, creating classification models for these audio signals is severely bottlenecked. This paper derives several new techniques to combat this problem by developing a novel Score-CAM based denoiser to extract an object's signature from noisy spectrographic data without being given any ground truth data. In particular, this paper proposes a novel generative adversarial network architecture for learning and producing spectrographic training data in similar distributions to low-feature spectrogram inputs. In addition, this paper also a generalizable class activation mapping based denoiser for different distributions of acoustic data, even real-world data distributions. Utilizing these novel architectures and proposed denoising techniques, these experiments demonstrate state-of-the-art noise reduction accuracy and improved classification accuracy than current audio classification standards. As such, this approach has applications not only to audio data but for countless data distributions used all around the world for machine learning.

Abstract PDF HTML Upgrade to Chat

Authors (1)

Noel Elias

References (11)

Summary

The paper presents a novel Score-CAM-based denoiser that extracts tonal features from noisy underwater spectrograms without relying on extensive ground truth data.
It leverages Wasserstein GANs and K-Means clustering to generate synthetic data and create precise class activation masks, achieving roughly 86% noise reduction.
The approach improves machine learning performance in underwater acoustics and opens avenues for robust unsupervised and semi-supervised learning applications.

A Novel Score-CAM Based Denoiser for Spectrographic Signature Extraction without Ground Truth

The paper under review introduces a novel approach to denoising spectrographic data, particularly targeting sonar-based audio signals by utilizing a Score-CAM-based denoiser. The core challenge addressed is the contamination of underwater audio spectrograms with extraneous noise, which significantly affects the performance of machine learning models in classifying these signals. Given the scarcity of ground truth data for clean spectrographic samples, the research primarily focuses on developing methodologies that do not rely heavily on large labeled datasets, which are often unavailable in underwater acoustics.

Methodological Innovations

The proposed denoising strategy incorporates several innovative techniques:

Generative Adversarial Networks (GANs): The paper leverages Wasserstein GANs to generate additional spectrographic data mirroring the noise distribution observed in real samples. This synthetic data serves to enhance the diversity and amount of training data available, facilitating better model generalization.
Score-CAM Based Denoising: A key contribution lies in the utilization of Score-CAM (Score-Weighted Class Activation Mapping) for producing saliency maps that help identify and extract tonal regions of interest from noisy spectrograms. This technique allows for the clear demarcation of relevant features without relying on the gradient information, thereby mitigating disturbance caused by noise in the gradient calculations.
Image Clustering for Mask Generation: The approach encompasses the clustering of embedding features to identify representative spectrogram samples, thereby creating more accurate class activation masks. By using K-Means clustering with K-Means++ initialization, the method identifies centroids that best capture the diversity within a class.

Experimental Results and Implications

The empirical results demonstrate that the proposed methods achieve superior performance in denoising spectrographic data compared to more traditional approaches such as auto-encoders. Notably, a confidence threshold of 0.75 provides an optimal balance between noise reduction and retention of critical tonal features. The method achieves approximately 86% of unwanted noise removal, outperforming other tested configurations.

This research implications are significant, especially for applications where obtaining clean data is a daunting task. The technique can be extended to other noisy domains, aiding in accurate signal classification and object recognition without extensive labeled datasets. Moreover, it provides a pathway for enhancing robust machine learning systems capable of handling real-world variability and noise, a critical requirement for advancements in AI-driven analysis in fields like underwater acoustics.

Future Prospects

Looking forward, the research opens several avenues for further exploration. Enhancing the GAN architecture to produce even higher fidelity synthetic data could further improve the classification models. Additionally, the integration of this denoising framework with other machine learning architectures, particularly those incorporating transfer learning, could provide insights into building more flexible and adaptive systems. Another promising direction could be exploring semi-supervised learning paradigms that further minimize the dependency on labeled data, pushing the boundaries of unsupervised domain adaptation.

In conclusion, this paper presents a commendable step in the evolution of denoising techniques, providing a crucial tool for the machine learning community dealing with spectrographic and similarly structured data. Its reliance on minimal ground truth, coupled with demonstrable efficacy, empowers AI systems to operate effectively in challenging, noisy environments.

Markdown Report Issue