Papers
Topics
Authors
Recent
2000 character limit reached

Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge (2404.11116v1)

Published 17 Apr 2024 in cs.SD, cs.AI, cs.LG, cs.MM, and eess.AS

Abstract: In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Distortion Ratio (SDR) and the Hearing Aid Audio Quality Index (HAAQI) metrics when comparing the performance of hdemucs against different versions of our model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (9)
  1. “Overview and Results of the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids,” in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
  2. “High fidelity speech enhancement with band-split rnn,” Proc. Interspeech, 2023.
  3. “Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement,” in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6633–6637.
  4. Alexandre Défossez, “Hybrid spectrogram and waveform source separation,” arXiv preprint arXiv:2111.03600, 2021.
  5. “Musdb18-hq - an uncompressed version of musdb18,” 2019.
  6. “The hearing-aid audio quality index (haaqi),” IEEE/ACM transactions on audio, speech, and language processing, vol. 24, no. 2, pp. 354–365, 2015.
  7. “Singing voice separation with deep u-net convolutional networks,” in International Society for Music Information Retrieval Conference, 2017.
  8. “Deepfilternet: A low complexity speech enhancement framework for full-band audio based on deep filtering,” in 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7407–7411.
  9. “Deep multi-frame filtering for hearing aids,” Proc. Interspeech, 2023.
Citations (2)

Summary

  • The paper introduces an innovative deep filtering technique integrated with Spec-UNet to improve music perception by emphasizing temporal fine structures.
  • It processes multi-channel STFT inputs and separates audio into component stems with individualized gain settings for refined audio performance.
  • Experiments on the MUSDB18 dataset show measurable gains in SDR and HAAQI, supporting its real-world applicability for hearing aid users.

Enhancement of Music Perception in Hearing Aids Using Spec-UNet and Deep Filters

Introduction and Motivation

The ICASSP 2024 Cadenza Challenge specifically addresses the enhancement of music perception for individuals with hearing aids. An innovative aspect of this paper lies in its focus on music enhancement over traditional speech-focused methods. The authors leverage the Spec-UNet architecture, optimizing the pipeline with components already established by hybrid Demucs and further improved by integrating DeepFilterNet's deep filter technique, aiming to enhance signal processing for hearing aid users by considering temporal fine structures more carefully.

Model Design and Implementation

The model operates by separating input audio mixtures from hearing aid microphones into component stems (drums, bass, vocals, and others) which it then remixes and amplifies based on user-specific gain settings. The innovative change involves the application of a Spec-UNet for fine-tuning, followed by a choice of outputting either conventional complex ratio masks or leveraging a new deep filter method for potentially improved audio quality based on:

  • Input Handling: Inputs are comprised of an STFT of the original audio and gain-adjusted versions, managed in multiple channels over time and frequency dimensions.
  • Network Details: Spec-UNet extends the processed audio data, optionally applying deep filtering to enhance alignment with temporal fine structures emphasized by the HAAQI metric.

Experiments and Outcomes

The experimental validation utilizes the MUSDB18 dataset, with a focus on a quantitative evaluation using the Signal-to-Distortion Ratio (SDR) and Hearing Aid Audio Quality Index (HAAQI). The results demonstrate that:

  1. Implementation of deep filters achieved slight improvements in both SDR and HAAQI compared to the base and other modified network states.
  2. The model configuration leads to a consistent alignment with theoretical expectations of superior handling of temporal fine structures in audio signals.

Conclusions and Further Research

The incremental improvements highlighted through SDR and HAAQI scores reinforce the hypothesis that integrating Spec-UNet and deep filters can benefit music perception for hearing aid users. Future work is suggested to extend the model's applicability across diverse listener profiles and further explore the scalability of the proposed methods in real-world applications.

This paper importantly synthesizes current deep learning advancements with specific real-world applications, suggesting a pathway for continued research in enhancing hearing aid technology through focused audio signal processing improvements.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 16 likes about this paper.