Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping (2309.14521v2)

Published 25 Sep 2023 in eess.AS and cs.SD

Abstract: Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. J.-H. Chen and A. Gersho, “Adaptive Postfiltering for Quality Enhancement of Coded Speech,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 59–71, 1995.
  2. “Convolutional Neural Networks to Enhance Coded Speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 4, pp. 663–678, 2019.
  3. J. Skoglund and J.-M. Valin, “Improving Opus Low Bit Rate Quality with Neural Speech Synthesis,” in Proc. INTERSPEECH, 2019.
  4. “A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 836–840.
  5. “Enhancement of Coded Speech Using a Mask-Based Post-Filter,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6764–6768.
  6. “PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 831–835.
  7. “LACE: A light-weight, causal model for enhancing coded speech through adaptive convolutions,” in Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023.
  8. J.-M. Valin and J. Skoglund, “A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet,” in Proc. INTERSPEECH, 2019, pp. 3406–3410.
  9. “Generative Speech Coding with Predictive Variance Regularization,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6478–6482.
  10. “SoundStream: An End-to-End Neural Audio Codec,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 495–507, 2022.
  11. “NESC: Robust Neural End-2-End Speech Coding with GANs,” in Proc. INTERSPEECH, 2022.
  12. “LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  13. “High Fidelity Neural Audio Compression,” 2022, arXiv:2210.13438.
  14. “Audiodec: An Open-Source Streaming High-Fidelity Neural Audio Codec,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
  15. “Open-source Multi-speaker Corpora of the English Accents in the British Isles,” in Proc. LREC, 2020.
  16. “Open-Source High Quality Speech Datasets for Basque, Catalan and Galician,” in Proc. SLTU and CCURL, 2020.
  17. “A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese,” in Proc. SLTU, 2018.
  18. “Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech,” in Proc. LREC, 2020.
  19. “Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems,” in Proc. LREC, 2020.
  20. “Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech,” in Proc. LREC, 2020.
  21. “Rapid development of TTS corpora for four South African languages,” in Proc. INTERSPEECH, 2017.
  22. “Developing an Open-Source Corpus of Yoruba Speech,” in Proc. INTERSPEECH, 2020.
  23. “Hi-Fi Multi-Speaker English TTS Dataset,” in Proc. INTERSPEECH, 2021, pp. 2776–2780.
  24. “UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation,” in Proc. INTERSPEECH, 2021.
  25. “Framewise WaveGAN: High speed adversarial vocoder in time domain with very low computational complexity,” in ICASSP 2023, 2023.
  26. “Least Squares Generative Adversarial Networks,” 10 2017, pp. 2813–2821.
  27. ITU-T, “Recommendation P.808: Subjective evaluation of speech quality with a crowdsourcing approach,” 2018.
  28. “SpeechBrain: A General-Purpose Speech Toolkit,” 2021, arXiv:2106.04624.
  29. “Librispeech: An ASR corpus based on public domain audio books,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04 2015, pp. 5206–5210.
Citations (3)

Summary

We haven't generated a summary for this paper yet.