Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations (2310.14487v1)

Published 23 Oct 2023 in cs.CV and cs.AI

Abstract: Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, the computational complexity inherent in these methodologies presents a substantial impediment, constraining the attainable frame rates and resolutions in practical applications. In response to this predicament, we propose VQ-NeRF, an effective and efficient pipeline for enhancing implicit neural representations via vector quantization. The essence of our method involves reducing the sampling space of NeRF to a lower resolution and subsequently reinstating it to the original size utilizing a pre-trained VAE decoder, thereby effectively mitigating the sampling time bottleneck encountered during rendering. Although the codebook furnishes representative features, reconstructing fine texture details of the scene remains challenging due to high compression rates. To overcome this constraint, we design an innovative multi-scale NeRF sampling scheme that concurrently optimizes the NeRF model at both compressed and original scales to enhance the network's ability to preserve fine details. Furthermore, we incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions. Extensive experiments demonstrate the effectiveness of our model in achieving the optimal trade-off between rendering quality and efficiency. Evaluation on the DTU, BlendMVS, and H3DS datasets confirms the superior performance of our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, 333–350. Springer.
  2. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12873–12883.
  3. NVTC: Nonlinear Vector Transform Coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6101–6110.
  4. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5501–5510.
  5. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099.
  6. Parametric implicit face representation for audio-driven facial reenactment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12759–12768.
  7. Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 406–413.
  8. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  9. One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17969–17978.
  10. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33: 15651–15663.
  11. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1): 99–106.
  12. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4): 1–15.
  13. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3504–3515.
  14. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5589–5599.
  15. Stylesdf: High-resolution 3d-consistent image and geometry generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13503–13513.
  16. isdf: Real-time neural signed distance fields for robot perception. arXiv preprint arXiv:2204.02296.
  17. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  18. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
  19. H3d-net: Few-shot high-fidelity 3d head reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5620–5629.
  20. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6229–6238.
  21. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5459–5469.
  22. Neural discrete representation learning. Advances in neural information processing systems, 30.
  23. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689.
  24. Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis. arXiv preprint arXiv:2308.02840.
  25. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4): 600–612.
  26. WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields. arXiv preprint arXiv:2308.04826.
  27. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1790–1799.
  28. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34: 4805–4815.
  29. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33: 2492–2502.
  30. Coordinates Are NOT Lonely-Codebook Prior Helps Implicit Neural 3D Representations. Advances in Neural Information Processing Systems, 35: 12705–12717.
  31. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5752–5761.
  32. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 586–595.
  33. LVQAC: Lattice Vector Quantization Coupled with Spatially Adaptive Companding for Efficient Learned Image Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10239–10248.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yiying Yang (15 papers)
  2. Wen Liu (55 papers)
  3. Fukun Yin (11 papers)
  4. Xin Chen (457 papers)
  5. Gang Yu (114 papers)
  6. Jiayuan Fan (29 papers)
  7. Tao Chen (397 papers)
Citations (2)