Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Self-Supervised Embeddings for Speech Enhancement (2204.03339v2)

Published 7 Apr 2022 in eess.AS

Abstract: Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) performance on several downstream tasks. However, there remains room for improvement in speech enhancement (SE) tasks. In this study, we used a cross-domain feature to solve the problem that SSL embeddings may lack fine-grained information to regenerate speech signals. By integrating the SSL representation and spectrogram, the result can be significantly boosted. We further study the relationship between the noise robustness of SSL representation via clean-noisy distance (CN distance) and the layer importance for SE. Consequently, we found that SSL representations with lower noise robustness are more important. Furthermore, our experiments on the VCTK-DEMAND dataset demonstrated that fine-tuning an SSL representation with an SE model can outperform the SOTA SSL-based SE methods in PESQ, CSIG and COVL without invoking complicated network architectures. In later experiments, the CN distance in SSL embeddings was observed to increase after fine-tuning. These results verify our expectations and may help design SE-related SSL training in the future.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kuo-Hsuan Hung (22 papers)
  2. Huan-Hsin Tseng (26 papers)
  3. Hsin-Tien Chiang (5 papers)
  4. Yu Tsao (200 papers)
  5. Chii-Wann Lin (3 papers)
  6. Szu-Wei Fu (46 papers)
Citations (34)

Summary

We haven't generated a summary for this paper yet.