Integrating Plug-and-Play Data Priors with Weighted Prediction Error for Speech Dereverberation (2312.02773v1)
Abstract: Speech dereverberation aims to alleviate the detrimental effects of late-reverberant components. While the weighted prediction error (WPE) method has shown superior performance in dereverberation, there is still room for further improvement in terms of performance and robustness in complex and noisy environments. Recent research has highlighted the effectiveness of integrating physics-based and data-driven methods, enhancing the performance of various signal processing tasks while maintaining interpretability. Motivated by these advancements, this paper presents a novel dereverberation frame-work, which incorporates data-driven methods for capturing speech priors within the WPE framework. The plug-and-play strategy (PnP), specifically the regularization by denoising (RED) strategy, is utilized to incorporate speech prior information learnt from data during the optimization problem solving iterations. Experimental results validate the effectiveness of the proposed approach.
- “Late reverberation cancellation using bayesian estimation of multi-channel linear predictors and student’s t-source prior,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 6, pp. 1007–1018, Mar. 2019.
- V. Kothapally and J. HL Hansen, “Skipconvgan: Monaural speech dereverberation using generative adversarial networks via complex time-frequency masking,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 30, pp. 1600–1613, Mar. 2022.
- H. Kuttruff, Room acoustics, Crc Press, 2016.
- Speech dereverberation, vol. 2, Springer, 2010.
- “Variational bayesian inference for multichannel dereverberation and noise reduction,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 8, pp. 1320–1335, Aug. 2014.
- “The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech,” in in Proc. IEEE Workshop Appl. Signal Process. to Audio Acoust. IEEE, 2013, pp. 1–4.
- I. Kodrasi and S. Doclo, “Joint dereverberation and noise reduction based on acoustic multi-channel equalization,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 4, pp. 680–693, Apr. 2016.
- I. Kodrasi and S. Doclo, “Analysis of eigenvalue decomposition-based late reverberation power spectral density estimation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 6, pp. 1106–1118, Jun. 2018.
- “A simple theory and new method of differential beamforming with uniform linear microphone arrays,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 1079–1093, Mar. 2020.
- “Speech dereverberation based on maximum-likelihood estimation with time-varying Gaussian source model,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 16, no. 8, pp. 1512–1527, Nov. 2008.
- “Speech dereverberation based on variance-normalized delayed linear prediction,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1717–1731, Sep. 2010.
- A. Jukić and S. Doclo, “Speech dereverberation using weighted prediction error with laplacian model of the desired signal,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, 2014, pp. 5172–5176.
- “Online speech dereverberation using kalman filter and em algorithm,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 2, pp. 394–406, Nov. 2014.
- Z. Wang and D. Wang, “Deep learning based target cancellation for speech dereverberation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 941–950, Feb. 2020.
- “Evaluation and comparison of late reverberation power spectral density estimators,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 6, pp. 1056–1071, Jun. 2018.
- “Multi-channel linear prediction-based speech dereverberation with sparse priors,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 9, pp. 1509–1520, Jun. 2015.
- M. Witkowski and K. Kowalczyk, “Split bregman approach to linear prediction based dereverberation with enforced speech sparsity,” IEEE Signal Process. Lett., vol. 28, pp. 942–946, Apr. 2021.
- D. Wang and J. Chen, “Supervised speech separation based on deep learning: An overview,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 10, pp. 1702–1726, May. 2018.
- “Learning spectral mapping for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 6, pp. 982–992, Mar. 2015.
- “Speech dereverberation using long short-term memory,” in Proceedings of interspeech, 2015.
- “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, 2020, pp. 46–50.
- “Speech enhancement via attention masking network (seamnet): An end-to-end system for joint suppression of noise and reverberation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 515–526, Dec. 2020.
- “Monaural speech dereverberation using temporal convolutional networks with self attention,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 1598–1607, May 2020.
- “Convolutive prediction for monaural speech dereverberation and noisy-reverberant speaker separation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 3476–3490, Nov. 2021.
- “Model-based deep learning: On the intersection of deep learning and optimization,” arXiv preprint arXiv:2205.02640, 2022.
- “Deep learning methods for solving linear inverse problems: Research directions and paradigms,” Signal Processing, vol. 177, pp. 107729 (23 pages), 2020.
- “Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing,” IEEE Signal Process. Mag., pp. 18–44, 2021.
- “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Trans. Comput. Imaging, vol. 3, no. 1, pp. 84–98, 2016.
- “Plug-and-play image restoration with deep denoiser prior,” IEEE Trans.Pattern Anal. and Mach. Intell., June. 2021.
- “Integration of physics-based and data-driven models for hyperspectral image unmixing: A summary of current methods,” IEEE Signal Process. Mag., vol. 40, no. 2, pp. 61–74, Feb. 2023.
- “Hyperspectral image super-resolution via deep prior regularization with parameter estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, pp. 1708–1723, 2021.
- “A plug-and-play priors framework for hyperspectral unmixing,” IEEE Trans. Geosc. Remote Sens., vol. 60, pp. 1–13, Jan. 2021.
- “Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery,” IEEE signal process. mag., vol. 37, no. 1, pp. 105–116, 2020.
- “Neural network-based spectrum estimation for online wpe dereverberation.,” in Proceedings of interspeech, 2017, pp. 384–388.
- “Joint optimization of neural network-based wpe dereverberation and acoustic model for robust online asr,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, 2019, pp. 6655–6659.
- “The little engine that could: Regularization by denoising (RED),” SIAM J. Imaging Sci., vol. 10, no. 4, pp. 1804–1844, 2017.
- “Learning spectral-spatial prior via 3DDNCNN for hyperspectral image deconvolution,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, 2020, pp. 2403–2407.
- “Phase-aware speech enhancement with deep complex u-net,” in Int. Conf. on Learn. Representations, Sep. 2019.
- “Librimix: An open-source dataset for generalizable speech separation,” arXiv preprint arXiv:2005.11262, 2020.
- “DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement,” arXiv preprint arXiv:2008.00264, Sep. 2020.
- “Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, May 2022, pp. 9281–9285.
- “ICASSP 2022 deep noise suppression challenge,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, May 2022, pp. 9271–9275.
- “Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2019, pp. 6990–6994.
- A. Varga and H. J. Steeneken, “Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech communication, vol. 12, no. 3, pp. 247–251, July 1993.
- “The third ‘chime’speech separation and recognition challenge: Dataset, task and baselines,” in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, Dec 2015, pp. 504–511.
- “Csr-i (wsj0) complete ldc93s6a,” Web Download. Philadelphia: Linguistic Data Consortium, vol. 83, 1993.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- P. Ronneberger, O.and Fischer and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, Oct. 2015, pp. 234–241.
- Y. Luo and N. Mesgarani, “Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 8, pp. 1256–1266, May 2019.
- “Whamr!: Noisy and reverberant single-channel speech separation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, May 2020, pp. 696–700.
- “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2001, vol. 2, pp. 749–752.
- “A summary of the reverb challenge: state-of-the-art and remaining challenges in reverberant speech processing research,” EURASIP J. Adv. Signal Process., vol. 2016, no. 1, pp. 1–19, 2016.
- “Image method for efficiently simulating small-room acoustics,” J. Acoust. Soc. Am., vol. 65, no. 4, pp. 943–950, June 1979.
- “Librispeech: an asr corpus based on public domain audio books,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. IEEE, Apri. 2015, pp. 5206–5210.
- “Kronecker product multichannel linear filtering for adaptive weighted prediction error-based speech dereverberation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 30, pp. 1277–1289, Mar. 2022.
- Ziye Yang (6 papers)
- Wenxing Yang (1 paper)
- Kai Xie (13 papers)
- Jie Chen (602 papers)