Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Published 15 Sep 2017 in cs.SD and cs.LG | (1709.05362v1)

Abstract: Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (387)

View on Semantic Scholar

Summary

The paper presents a Bayesian NMF formulation that models temporal dependencies between speech and noise to significantly improve speech quality.
It introduces a novel BNMF-HMM framework that simultaneously classifies noise and enhances speech without prior training on specific noise types.
The study demonstrates an unsupervised online noise model learning approach that outperforms traditional methods with improved metrics like SDR, PESQ, and SegSNR.

Overview of Speech Enhancement Using Bayesian Nonnegative Matrix Factorization

The paper "Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization" by Nasser Mohammadiha, Paris Smaragdis, and Arne Leijon presents innovative approaches to speech enhancement in both supervised and unsupervised settings. Utilizing Nonnegative Matrix Factorization (NMF), specifically a Bayesian variant (BNMF), this research offers robust solutions to the longstanding problem of noise interference in monaural speech recordings.

The research focuses on employing BNMF algorithms to improve speech quality in noisy environments, comparing this modern methodology with traditional techniques such as HMM-based systems and Wiener filtering. A key focus of the paper is overcoming the practical limitations of supervised systems that require a priori training on each specific noise type. By introducing Bayesian models for speech enhancement and incorporating temporal dependencies, the authors present methods that do not necessitate prior knowledge of the noise environment.

Key Contributions

Bayesian NMF for Speech Enhancement: The research develops a Bayesian formulation of NMF (BNMF) to model the temporal dependencies between speech and noise signals. This Bayesian approach allows for constructing hierarchical prior distributions, improving the effectiveness of the speech enhancement algorithm in various noisy conditions.
BNMF-HMM Framework: The authors propose a novel integration of BNMF with Hidden Markov Models (BNMF-HMM), which enables simultaneous noise classification and speech enhancement. This BNMF-HMM structure can classify noise without requiring prior knowledge of noise types, thereby addressing the mismatch problem between training and testing stages seen in previous methods.
Online Noise Model Learning: A significant advancement presented in the paper is an unsupervised scheme where the noise basis matrix is learned online from the noisy mixture. This approach, which the authors term as Online BNMF, is shown to outperform other state-of-the-art unsupervised enhancement techniques due to its adaptive learning of noise characteristics.
Performance Evaluation: Through extensive simulations and objective measures such as SDR, PESQ, and SegSNR, the paper demonstrates that BNMF-based methods significantly enhance the quality of speech signals compared to traditional speech enhancement approaches. The experiments illustrate the capability of the proposed methods to operate effectively in diverse and atypical noise conditions.

Implications and Future Directions

The implications of this research extend into various domains requiring noise-robust speech processing, from hearing aid technology to telecommunication systems. The Bayesian nature of the proposed framework offers flexibility in modeling dynamic noise environments, a crucial factor for real-world applications. As this work primarily focuses on single-channel speech enhancement, future research might explore multi-channel or binaural settings where additional spatial information could be leveraged.

Furthermore, the integration of NMF with deep learning models may provide new possibilities for improving the adaptability and performance of speech enhancement systems. Automatically learning noise characteristics in real time without the need for pre-labeled data positions BNMF as a promising tool in the advancement of unsupervised learning algorithms in speech processing.

In conclusion, this paper provides a substantial contribution to the field of speech enhancement by combining probabilistic models and matrix factorization techniques. The proposed BNMF frameworks demonstrate effectiveness in practical scenarios, thus advancing the capabilities of both supervised and unsupervised speech enhancement systems.

Markdown Report Issue