Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec (2402.01271v1)

Published 2 Feb 2024 in eess.AS and cs.SD

Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Linping Xu (1 paper)
  2. Jiawei Jiang (47 papers)
  3. Dejun Zhang (4 papers)
  4. Xianjun Xia (13 papers)
  5. Li Chen (590 papers)
  6. Yijian Xiao (8 papers)
  7. Piao Ding (2 papers)
  8. Shenyi Song (2 papers)
  9. Sixing Yin (4 papers)
  10. Ferdous Sohel (35 papers)
Citations (6)