Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech Denoising with Auditory Models (2011.10706v3)

Published 21 Nov 2020 in eess.AS and cs.SD

Abstract: Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform. The development of high-performing neural network sound recognition systems has raised the possibility of using deep feature representations as 'perceptual' losses with which to train denoising systems. We explored their utility by first training deep neural networks to classify either spoken words or environmental sounds from audio. We then trained an audio transform to map noisy speech to an audio waveform that minimized the difference in the deep feature representations between the output audio and the corresponding clean audio. The resulting transforms removed noise substantially better than baseline methods trained to reconstruct clean waveforms, and also outperformed previous methods using deep feature losses. However, a similar benefit was obtained simply by using losses derived from the filter bank inputs to the deep networks. The results show that deep features can guide speech enhancement, but suggest that they do not yet outperform simple alternatives that do not involve learned features.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mark R. Saddler (1 paper)
  2. Andrew Francl (2 papers)
  3. Jenelle Feather (6 papers)
  4. Kaizhi Qian (23 papers)
  5. Yang Zhang (1129 papers)
  6. Josh H. McDermott (6 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.