An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec (2402.01271v1)
Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.
- Linping Xu (1 paper)
- Jiawei Jiang (47 papers)
- Dejun Zhang (4 papers)
- Xianjun Xia (13 papers)
- Li Chen (590 papers)
- Yijian Xiao (8 papers)
- Piao Ding (2 papers)
- Shenyi Song (2 papers)
- Sixing Yin (4 papers)
- Ferdous Sohel (35 papers)