Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HILCodec: High-Fidelity and Lightweight Neural Audio Codec (2405.04752v2)

Published 8 May 2024 in eess.AS and cs.SD

Abstract: The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of the SEANet-based codec does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, HILCodec, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sunghwan Ahn (6 papers)
  2. Beom Jun Woo (3 papers)
  3. Min Hyun Han (11 papers)
  4. Chanyeong Moon (1 paper)
  5. Nam Soo Kim (47 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.