Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation (2309.13240v1)

Published 23 Sep 2023 in cs.CV and cs.RO

Abstract: In various applications, such as robotic navigation and remote visual assistance, expanding the field of view (FOV) of the camera proves beneficial for enhancing environmental perception. Unlike image outpainting techniques aimed solely at generating aesthetically pleasing visuals, these applications demand an extended view that faithfully represents the scene. To achieve this, we formulate a new problem of faithful FOV extrapolation that utilizes a set of pre-captured images as prior knowledge of the scene. To address this problem, we present a simple yet effective solution called NeRF-Enhanced Outpainting (NEO) that uses extended-FOV images generated through NeRF to train a scene-specific image outpainting model. To assess the performance of NEO, we conduct comprehensive evaluations on three photorealistic datasets and one real-world dataset. Extensive experiments on the benchmark datasets showcase the robustness and potential of our method in addressing this challenge. We believe our work lays a strong foundation for future exploration within the research community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in IEEE International Conference on Robotics and Automation, ICRA, 2017, pp. 3357–3364.
  2. P. Anderson, Q. Wu, D. Teney, J. Bruce, M. Johnson, N. Sünderhauf, I. D. Reid, S. Gould, and A. van den Hengel, “Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 3674–3683.
  3. S. Lee, R. Yu, J. Xie, S. M. Billah, and J. M. Carroll, “Opportunities for human-AI collaboration in remote sighted assistance,” in ACM International Conference on Intelligent User Interfaces, IUI, 2022, pp. 63–78.
  4. J. Xie, R. Yu, S. Lee, Y. Lyu, S. M. Billah, and J. M. Carroll, “Helping helpers: Supporting volunteers in remote sighted assistance with augmented reality maps,” in ACM Conference on Designing Interactive Systems, DIS, 2022, pp. 881–897.
  5. Z. Yang, J. Dong, P. Liu, Y. Yang, and S. Yan, “Very long natural scenery image prediction by outpainting,” in IEEE/CVF International Conference on Computer Vision, ICCV, 2019, pp. 10 560–10 569.
  6. J. Li, C. Chen, and Z. Xiong, “Contextual outpainting with object-level contrastive learning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 11 441–11 450.
  7. Y. Cheng, C. H. Lin, H. Lee, J. Ren, S. Tulyakov, and M. Yang, “Inout: Diverse image outpainting via GAN inversion,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp. 11 421–11 430.
  8. K. Yao, P. Gao, X. Yang, J. Sun, R. Zhang, and K. Huang, “Outpainting by queries,” in European Conference on Computer Vision, ECCV, 2022, pp. 153–169.
  9. Y. Wang, X. Tao, X. Shen, and J. Jia, “Wide-context semantic image extrapolation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 1399–1408.
  10. D. Guo, H. Liu, H. Zhao, Y. Cheng, Q. Song, Z. Gu, H. Zheng, and B. Zheng, “Spiral generative network for image extrapolation,” in European Conference on Computer Vision, ECCV, 2020, pp. 701–717.
  11. B. Khurana, S. R. Dash, A. Bhatia, A. Mahapatra, H. Singh, and K. Kulkarni, “Semie: Semantically-aware image extrapolation,” in IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 14 880–14 889.
  12. D. Krishnan, P. Teterwak, A. Sarna, A. Maschinot, C. Liu, D. Belanger, and W. T. Freeman, “Boundless: Generative adversarial networks for image extension,” in IEEE/CVF International Conference on Computer Vision, ICCV, 2019, pp. 10 520–10 529.
  13. B. Zhou, À. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1452–1464, 2018.
  14. P. Lindenberger, P.-E. Sarlin, V. Larsson, and M. Pollefeys, “Pixel-perfect structure-from-motion with featuremetric refinement,” in IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 5987–5997.
  15. P. Sarlin, A. Unagar, M. Larsson, H. Germain, C. Toft, V. Larsson, M. Pollefeys, V. Lepetit, L. Hammarstrand, F. Kahl, and T. Sattler, “Back to the feature: Learning robust camera localization from pixels to pose,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 3247–3257.
  16. J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 4104–4113.
  17. R. Szeliski, “Image alignment and stitching: A tutorial,” Found. Trends Comput. Graph. Vis., vol. 2, no. 1, 2006.
  18. N. Li, Y. Xu, and C. Wang, “Quasi-homography warps in image stitching,” IEEE Trans. Multim., vol. 20, no. 6, pp. 1365–1375, 2018.
  19. T. Liao and N. Li, “Single-perspective warps in natural image stitching,” IEEE Trans. Image Process., vol. 29, pp. 724–735, 2020.
  20. K. Lee and J. Sim, “Warping residual based image stitching for large parallax,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 8195–8203.
  21. M. Wang, A. Shamir, G.-Y. Yang, J.-K. Lin, G.-W. Yang, S.-P. Lu, and S.-M. Hu, “Biggerselfie: Selfie video expansion with hand-held camera,” IEEE Trans. Image Process., vol. 27, no. 12, pp. 5854–5865, 2018.
  22. S. Lee, J. Lee, B. Kim, K. Kim, and J. Noh, “Video extrapolation using neighboring frames,” ACM Trans. Graph., vol. 38, no. 3, pp. 20:1–20:13, 2019.
  23. L. Ma, S. Georgoulis, X. Jia, and L. Van Gool, “Fov-net: Field-of-view extrapolation using self-attention and uncertainty,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4321–4328, 2021.
  24. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision, ECCV, 2020, pp. 405–421.
  25. A. A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” in IEEE/CVF International Conference on Computer Vision, ICCV, 1999, pp. 1033–1038.
  26. A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer,” in SIGGRAPH, L. Pocock, Ed., 2001, pp. 341–346.
  27. R. A. Yeh, C. Chen, T. Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do, “Semantic image inpainting with deep generative models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 6882–6890.
  28. S. Zhao, J. Cui, Y. Sheng, Y. Dong, X. Liang, E. I. Chang, and Y. Xu, “Large scale image completion via co-modulated generative adversarial networks,” in International Conference on Learning Representations, ICLR, 2021.
  29. R. Suvorov, E. Logacheva, A. Mashikhin, A. Remizova, A. Ashukha, A. Silvestrov, N. Kong, H. Goka, K. Park, and V. Lempitsky, “Resolution-robust large mask inpainting with fourier convolutions,” in IEEE/CVF Winter Conference on Applications of Computer Vision, WACV, 2022, pp. 3172–3182.
  30. W. Li, Z. Lin, K. Zhou, L. Qi, Y. Wang, and J. Jia, “MAT: mask-aware transformer for large hole image inpainting,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 10 748–10 758.
  31. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
  32. C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet, and M. Norouzi, “Palette: Image-to-image diffusion models,” in ACM SIGGRAPH, 2022, pp. 1–10.
  33. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 10 684–10 695.
  34. A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. Van Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 11 461–11 471.
  35. S. Cai, E. R. Chan, S. Peng, M. Shahbazi, A. Obukhov, L. Van Gool, and G. Wetzstein, “Diffdreamer: Towards consistent unsupervised single-view scene extrapolation with conditional diffusion models.”
  36. K. Schwarz, Y. Liao, M. Niemeyer, and A. Geiger, “Graf: Generative radiance fields for 3d-aware image synthesis,” Advances in Neural Information Processing Systems, vol. 33, pp. 20 154–20 166, 2020.
  37. A. Yu, V. Ye, M. Tancik, and A. Kanazawa, “pixelnerf: Neural radiance fields from one or few images,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 4578–4587.
  38. S. Liu, X. Zhang, Z. Zhang, R. Zhang, J.-Y. Zhu, and B. Russell, “Editing conditional radiance fields,” in IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 5773–5783.
  39. J. Zhang, X. Liu, X. Ye, F. Zhao, Y. Zhang, M. Wu, Y. Zhang, L. Xu, and J. Yu, “Editable free-viewpoint video using a layered neural representation,” ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1–18, 2021.
  40. C. Wang, M. Chai, M. He, D. Chen, and J. Liao, “Clip-nerf: Text-and-image driven manipulation of neural radiance fields,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 3835–3844.
  41. P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” arXiv preprint arXiv:2106.10689, 2021.
  42. L. Yariv, J. Gu, Y. Kasten, and Y. Lipman, “Volume rendering of neural implicit surfaces,” Advances in Neural Information Processing Systems, vol. 34, pp. 4805–4815, 2021.
  43. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graph., vol. 41, no. 4, pp. 102:1–102:15, 2022.
  44. A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance fields,” in European Conference on Computer Vision, ECCV, vol. 13692, 2022, pp. 333–350.
  45. A. Moreau, N. Piasco, D. Tsishkou, B. Stanciulescu, and A. de La Fortelle, “Lens: Localization enhanced by nerf synthesis,” in Conference on Robot Learning, 2022, pp. 1347–1356.
  46. Y. Ge, H. Behl, J. Xu, S. Gunasekar, N. Joshi, Y. Song, X. Wang, L. Itti, and V. Vineet, “Neural-sim: Learning to generate training data with nerf,” in European Conference on Computer Vision, ECCV, 2022, pp. 477–493.
  47. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  48. Y. Liu, W. Lai, M. Yang, Y. Chuang, and J. Huang, “Hybrid neural fusion for full-frame video stabilization,” in IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 2279–2288.
  49. C. Sun, M. Sun, and H. Chen, “Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 5449–5459.
  50. P. Dai, Y. Zhang, X. Yu, X. Lyu, and X. Qi, “Hybrid neural rendering for large-scale scenes with motion blur,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 154–164.
  51. J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma, et al., “The replica dataset: A digital replica of indoor spaces,” arXiv:1906.05797, 2019.
  52. F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese, “Gibson env: Real-world perception for embodied agents,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 9068–9079.
  53. S. K. Ramakrishnan, A. Gokaslan, E. Wijmans, O. Maksymets, A. Clegg, J. Turner, E. Undersander, W. Galuba, A. Westbury, A. X. Chang, et al., “Habitat-matterport 3d dataset (HM3d): 1000 large-scale 3d environments for embodied AI,” arXiv:2109.08238, 2021.
  54. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 5828–5839.
  55. M. Savva, A. Kadian, O. Maksymets, Y. Zhao, E. Wijmans, B. Jain, J. Straub, J. Liu, V. Koltun, J. Malik, et al., “Habitat: A platform for embodied AI research,” in IEEE/CVF International Conference on Computer Vision, ICCV, 2019, pp. 9339–9347.
  56. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 586–595.
  57. R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 5297–5307.
  58. Z. Li, Q. Wang, F. Cole, R. Tucker, and N. Snavely, “Dynibar: Neural dynamic image-based rendering,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 4273–4284.
Citations (1)

Summary

We haven't generated a summary for this paper yet.