Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 30 tok/s Pro
2000 character limit reached

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding (2410.15749v1)

Published 21 Oct 2024 in cs.SD and eess.AS

Abstract: Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into multiple layers of discrete codes with uniform time scales. However, this strategy overlooks the differences in information density across various speech features, leading to redundant encoding of sparse information, which limits the performance of these methods at low bitrate. This paper proposes MsCodec, a novel multi-scale neural speech codec that encodes speech into multiple layers of discrete codes, each corresponding to a different time scale. This encourages the model to decouple speech features according to their diverse information densities, consequently enhancing the performance of speech compression. Furthermore, we incorporate mutual information loss to augment the diversity among speech codes across different layers. Experimental results indicate that our proposed method significantly improves codec performance at low bitrate.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.