Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VQ-NeRV: A Vector Quantized Neural Representation for Videos (2403.12401v1)

Published 19 Mar 2024 in cs.CV

Abstract: Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising. INR-based approaches reconstruct video frames from content-agnostic embeddings, which hampers their efficacy in video frame regression and restricts their generalization ability for video interpolation. To address these deficiencies, Hybrid Neural Representation for Videos (HNeRV) was introduced with content-adaptive embeddings. Nevertheless, HNeRV's compression ratios remain relatively low, attributable to an oversight in leveraging the network's shallow features and inter-frame residual information. In this work, we introduce an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block. This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively. This approach proves particularly advantageous in video compression, as it results in smaller size compared to quantized features. Furthermore, we introduce an original codebook optimization technique, termed shallow codebook optimization, designed to refine the utility and efficiency of the codebook. The experimental evaluations indicate that VQ-NeRV outperforms HNeRV on video regression tasks, delivering superior reconstruction quality (with an increase of 1-2 dB in Peak Signal-to-Noise Ratio (PSNR)), better bit per pixel (bpp) efficiency, and improved video inpainting outcomes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Scale-space flow for end-to-end optimized video compression. In CVPR, pages 8503–8512, 2020.
  2. Discrete cosine transform. TC, 100(1):90–93, 1974.
  3. Ps-nerv: Patch-wise stylized neural representations for videos. In ICIP, pages 41–45. IEEE, 2023.
  4. Nerv: Neural representations for videos. NeurIPS, 34:21557–21568, 2021.
  5. Hnerv: A hybrid neural representation for videos. In ICCV, pages 10270–10279, 2023.
  6. Zs-srt: An efficient zero-shot super-resolution training method for neural radiance fields. arXiv preprint arXiv:2312.12122, 2023.
  7. Video compression with rate-distortion autoencoders. In ICCV, pages 7033–7042, 2019.
  8. Straightening out the straight-through estimator: Overcoming optimization challenges in vector quantized networks. arXiv preprint arXiv:2305.08842, 2023.
  9. Efficient video compression via content-adaptive super-resolution. In ICCV, pages 4521–4530, 2021.
  10. Hinerv: Video compression with hierarchical encoding-based neural representation. NeurIPS, 36, 2024.
  11. Ffnerv: Flow-guided frame-wise neural representations for videos. In ACMMM, pages 7859–7870, 2023.
  12. Deep contextual video compression. NeurIPS, 34:18114–18125, 2021.
  13. E-nerv: Expedite neural video representation with disentangled spatial-temporal context. In ECCV, pages 267–284. Springer, 2022.
  14. Conditional entropy coding for efficient video compression. In ECCV, pages 453–468. Springer, 2020a.
  15. Are deep neural architectures losing information? invertibility is indispensable. In ICONIP, pages 172–184. Springer, 2020b.
  16. A convnet for the 2020s. In CVPR, pages 11976–11986, 2022.
  17. Dvc: An end-to-end deep video compression framework. In CVPR, pages 11006–11015, 2019.
  18. Uvg dataset: 50/120fps 4k sequences for video codec analysis and development. In ACMMM, pages 297–302, 2020.
  19. Deflate compression algorithm. International Journal of Engineering Research and General Science, 4(1):430–436, 2016.
  20. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, pages 165–174, 2019.
  21. Elf-vc: Efficient learned flexible-rate video coding. In CVPR, pages 14479–14488, 2021.
  22. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241. Springer, 2015.
  23. T. Roosendaal. Big buck bunny. In SIGGRAPH, pages 62–62. 2008.
  24. Implicit neural representations with periodic activation functions. NeurIPS, 33:7462–7473, 2020.
  25. Overview of the high efficiency video coding (hevc) standard. TCSVT, 22(12):1649–1668, 2012.
  26. Neural discrete representation learning. NeurIPS, 30, 2017.
  27. Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset. In ICIP, pages 1509–1513. IEEE, 2016.
  28. Overview of the h. 264/avc video coding standard. IEEE TIP, 13(7):560–576, 2003.
  29. Video compression through image interpolation. In ECCV, pages 416–431, 2018.
  30. Invertible image rescaling. In ECCV, pages 126–144. Springer, 2020.
  31. Dnerv: Modeling inherent dynamics via difference neural representation for videos. In CVPR, pages 2031–2040, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yunjie Xu (2 papers)
  2. Xiang Feng (16 papers)
  3. Feiwei Qin (18 papers)
  4. Ruiquan Ge (20 papers)
  5. Yong Peng (34 papers)
  6. Changmiao Wang (33 papers)
Citations (2)