RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent (2307.03017v3)
Abstract: With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field generation from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field generation while maintaining rendering quality. Based on this insight, we introduce EffLiFe, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse view images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further sparsely optimized by focusing only on important MPI gradients in a few iterations. Nevertheless, relying solely on optimization can lead to artifacts at occlusion boundaries. Therefore, we propose an occlusion-aware iterative refinement module that removes visual artifacts in occluded regions by iteratively filtering the input. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivering better performance (about 2 dB higher in PSNR) compared to other online approaches.
- J. Flynn, M. Broxton, P. E. Debevec, M. DuVall, G. Fyffe, R. S. Overbeck, N. Snavely, and R. Tucker, “Deepview: View synthesis with learned gradient descent,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2362–2371, 2019, doi: 10.1109/CVPR.2019.00247.
- M. Broxton, J. Flynn, R. S. Overbeck, D. Erickson, P. Hedman, M. DuVall, J. Dourgarian, J. Busch, M. Whalen, and P. E. Debevec, “Immersive light field video with a layered mesh representation,” ACM Transactions on Graphics (TOG), vol. 39, pp. 86:1 – 86:15, 2020, doi: 10.1145/3386569.3392485.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision, 2020, doi: 10.1145/3503250.
- J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5835–5844, 2021, doi: 10.1109/ICCV48922.2021.00580.
- Q. Wang, Z. Wang, K. Genova, P. P. Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. A. Funkhouser, “Ibrnet: Learning multi-view image-based rendering,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4688–4697, 2021, doi: 10.1109/CVPR46437.2021.00466.
- A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su, “Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 104–14 113, 2021, doi: 10.1109/ICCV48922.2021.01386.
- H. Lin, S. Peng, Z. Xu, Y. Yan, Q. Shuai, H. Bao, and X. Zhou, “Efficient neural radiance fields for interactive free-viewpoint video,” SIGGRAPH Asia 2022 Conference Papers, 2021, doi: 10.1145/3550469.3555376.
- P. Solovev, T. Khakhulin, and D. Korzhenkov, “Self-improving multiplane-to-layer images for novel view synthesis,” 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 4298–4307, 2022, doi: 10.1109/WACV56688.2023.00429.
- B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar, “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” ACM Trans. Graph., vol. 38, no. 4, jul 2019, doi: 10.1145/3306346.3322980.
- D. B. Lindell, J. N. P. Martel, and G. Wetzstein, “Autoint: Automatic integration for fast neural volume rendering,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 551–14 560, 2020, doi: 10.1109/CVPR46437.2021.01432.
- D. Rebain, W. Jiang, S. Yazdani, K. Li, K. M. Yi, and A. Tagliasacchi, “Derf: Decomposed radiance fields,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14 148–14 156, 2020, doi: 10.1109/CVPR46437.2021.01393.
- C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 315–14 325, 2021, doi: 10.1109/ICCV48922.2021.01407.
- S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. P. C. Valentin, “Fastnerf: High-fidelity neural rendering at 200fps,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14 326–14 335, 2021, doi: 10.1109/ICCV48922.2021.01408.
- Y. Liu, S. Peng, L. Liu, Q. Wang, P. Wang, C. Theobalt, X. Zhou, and W. Wang, “Neural rays for occlusion-aware image-based rendering,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7814–7823, 2022, doi: 10.1109/CVPR52688.2022.00767.
- A. Trevithick and B. Yang, “Grf: Learning a general radiance field for 3d representation and rendering,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 162–15 172, doi: 10.1109/WACV56688.2023.00429.
- A. Yu, V. Ye, M. Tancik, and A. Kanazawa, “pixelnerf: Neural radiance fields from one or few images,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4576–4585, 2021, doi: 10.1109/CVPR46437.2021.00455.
- J. Chibane, A. Bansal, V. Lazova, and G. Pons-Moll, “Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7907–7916, 2021, doi: 10.1109/CVPR46437.2021.00782.
- A. Yu, S. Fridovich-Keil, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neural networks,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5491–5500, 2021, doi: 10.1109/CVPR52688.2022.00542.
- A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “Plenoctrees for real-time rendering of neural radiance fields,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5732–5741, 2021, doi: 10.1109/ICCV48922.2021.00570.
- C. Sun, M. Sun, and H.-T. Chen, “Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5449–5459, 2021, doi: 10.1109/CVPR52688.2022.00538.
- T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, “Stereo magnification: Learning view synthesis using multiplane images,” ArXiv, vol. abs/1805.09817, 2018, doi: 10.1145/3197517.3201323.
- P. P. Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely, “Pushing the boundaries of view extrapolation with multiplane images,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 175–184, doi:10.1109/CVPR.2019.00026.
- K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12 872–12 881, 2021, doi: 10.1109/CVPR52688.2022.01254.
- J. Chen, Z. Yu, L. Ma, and K. Zhang, “Uncertainty awareness with adaptive propagation for multi-view stereo,” Applied Intelligence, 2023, doi: 10.1007/s10489-023-04910-z. [Online]. Available: https://api.semanticscholar.org/CorpusID:261038143
- P. Jiang, X. Yang, Y.-R. Chen, W.-Z. Song, and Y. Li, “Adaptmvsnet: Efficient multi-view stereo with adaptive convolution and attention fusion,” Computers & Graphics, 2023, doi: 10.1016/j.cag.2023.08.014. [Online]. Available: https://api.semanticscholar.org/CorpusID:260792500
- R. Weilharter and F. Fraundorfer, “Highres-mvsnet: A fast multi-view stereo network for dense 3d reconstruction from high-resolution images,” IEEE Access, vol. 9, pp. 11 306–11 315, 2021, doi: 10.1109/ACCESS.2021.3050556.
- C.-Y. Chiu, Y.-T. Wu, I.-C. Shen, and Y.-Y. Chuang, “360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,” in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3056–3065, doi: 10.1109/WACV56688.2023.00307.
- W. Su, Q. Xu, and W. Tao, “Uncertainty guided multi-view stereo network for depth estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7796–7808, 2022, doi: 10.1109/TCSVT.2022.3183836.
- X. Guan, W. Tong, S. Jiang, P. Z. H. Sun, E. Q. Wu, and G. Chen, “Multistage pixel-visibility learning with cost regularization for multiview stereo,” IEEE Transactions on Automation Science and Engineering, vol. 20, no. 2, pp. 751–762, 2023, doi: 10.1109/TASE.2022.3165944.
- J. Adler and O. Öktem, “Learned primal-dual reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, pp. 1322–1332, 2017, doi: 10.1109/TMI.2018.2799231.
- ——, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Problems, vol. 33, p. 124007, 2017, doi: 10.1088/1361-6420/aa9581.
- G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, “Tensor displays,” ACM Transactions on Graphics (TOG), vol. 31, pp. 1 – 11, 2012, doi: 10.1145/2343456.2343480.
- T. K. Porter and T. D. S. Duff, “Compositing digital images,” international conference on computer graphics and interactive techniques, 1984, doi: 10.1145/964965.808606.
- B. Mildenhall, P. Hedman, R. Martin-Brualla, P. P. Srinivasan, and J. T. Barron, “Nerf in the dark: High dynamic range view synthesis from noisy raw images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16 169–16 178, 2021, doi: 10.1109/CVPR52688.2022.01571.
- D. Verbin, P. Hedman, B. Mildenhall, T. E. Zickler, J. T. Barron, and P. P. Srinivasan, “Ref-nerf: Structured view-dependent appearance for neural radiance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5481–5490, 2021, doi: 10.1109/CVPR52688.2022.00541.
- J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-nerf 360: Unbounded anti-aliased neural radiance fields,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5460–5469, 2021, doi: 10.1109/CVPR52688.2022.00539.
- M. Tancik, B. Mildenhall, T. Wang, D. Schmidt, P. P. Srinivasan, J. T. Barron, and R. Ng, “Learned initializations for optimizing coordinate-based neural representations,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2845–2854, 2020, doi: 10.1109/CVPR46437.2021.00287.
- T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (TOG), vol. 41, pp. 1 – 15, 2022, doi: 10.1145/3528223.3530127.
- S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9050–9059, 2020, doi: 10.1109/CVPR46437.2021.00894.
- T. Khakhulin, D. Korzhenkov, P. Solovev, G. Sterkin, A.-T. Ardelean, and V. S. Lempitsky, “Stereo magnification with multi-layer images,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8677–8686, 2022, doi: 10.1109/CVPR52688.2022.00849.
- J. Jin, M. Guo, J. Hou, H. Liu, and H. Xiong, “Light field reconstruction via deep adaptive fusion of hybrid lenses,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 050–12 067, 2023, doi: 10.1109/TPAMI.2023.3287603.
- G. Wu, Y. Wang, Y. Liu, L. Fang, and T. Chai, “Spatial-angular attention network for light field reconstruction,” IEEE Transactions on Image Processing, vol. 30, pp. 8999–9013, 2021, doi: 10.1109/TIP.2021.3122089.
- J. Shi, X. Jiang, and C. Guillemot, “Learning fused pixel and feature-based view reconstructions for light fields,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2552–2561, doi: 10.1109/CVPR42600.2020.00263.
- J. Lawrence, D. B. Goldman, S. Achar, G. M. Blascovich, J. G. Desloge, T. Fortes, E. M. Gomez, S. Häberling, H. Hoppe, A. Huibers et al., “Project starline: A high-fidelity telepresence system,” 2021, doi: 10.1145/3478513.3480490.
- Y. Zhang, J. Yang, Z. Liu, R. Wang, G. Chen, X. Tong, and B. Guo, “Virtualcube: An immersive 3d video communication system,” IEEE Transactions on Visualization and Computer Graphics, vol. PP, pp. 1–1, 2021, doi: 10.1109/TVCG.2022.3150512.
- Q. Chen and V. Koltun, “Photographic image synthesis with cascaded refinement networks,” international conference on computer vision, 2017, doi: 10.1109/ICCV.2017.168.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:6628106
- S. Wizadwongsa, P. Phongthawee, J. Yenphraphai, and S. Suwajanakorn, “Nex: Real-time view synthesis with neural basis expansion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8534–8543, doi: 10.1109/CVPR46437.2021.00843.
- E. Penner and L. Zhang, “Soft 3d reconstruction for view synthesis,” ACM Transactions on Graphics (TOG), vol. 36, pp. 1 – 11, 2017, doi: 10.1145/3130800.3130855.
- R. Szeliski and P. Golland, “Stereo matching with transparency and matting,” International Journal of Computer Vision, vol. 32, pp. 45–61, 1998, doi: 10.1109/ICCV.1998.710766.
- Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth inference for unstructured multi-view stereo,” in European Conference on Computer Vision, 2018, doi: 10.1007/978-3-030-01237-3_47.