Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting (2401.10564v1)

Published 19 Jan 2024 in cs.CV and cs.HC

Abstract: 360 images, with a field-of-view (FoV) of 180x360, provide immersive and realistic environments for emerging virtual reality (VR) applications, such as virtual tourism, where users desire to create diverse panoramic scenes from a narrow FoV photo they take from a viewpoint via portable devices. It thus brings us to a technical challenge: `How to allow the users to freely create diverse and immersive virtual scenes from a narrow FoV image with a specified viewport?' To this end, we propose a transformer-based 360 image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360 images. Compared with existing methods, e.g., [3], which primarily focus on inputs with rectangular masks and central locations while overlooking the spherical property of 360 images, our Dream360 offers higher outpainting flexibility and fidelity based on the spherical representation. Dream360 comprises two key learning stages: (I) codebook-based panorama outpainting via Spherical-VQGAN (S-VQGAN), and (II) frequency-aware refinement with a novel frequency-aware consistency loss. Specifically, S-VQGAN learns a sphere-specific codebook from spherical harmonic (SH) values, providing a better representation of spherical data distribution for scene modeling. The frequency-aware refinement matches the resolution and further improves the semantic consistency and visual fidelity of the generated results. Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods. We also conducted a user study involving 15 participants to interactively evaluate the quality of the generated results in VR, demonstrating the flexibility and superiority of our Dream360 framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Hrdfuse: Monocular 360° depth estimation by collaboratively learning holistic-with-regional depth distributions. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13273–13282, 2023.
  2. 360-degree image completion by two-stage conditional gans. 2019 IEEE International Conference on Image Processing (ICIP), pp. 4704–4708, 2019.
  3. Diverse plausible 360-degree image outpainting for efficient 3dcg background creation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11431–11440, 2022.
  4. Stereoscopic rendering of virtual environments with wide field-of-views up to 360°. 2014 IEEE Virtual Reality (VR), pp. 3–8, 2014.
  5. Augmenting immersive telepresence experience with a virtual body. IEEE Transactions on Visualization and Computer Graphics, 28:2135–2145, 2022.
  6. Frequency domain image translation: More photo-realistic, better identity-preserving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13930–13940, 2021.
  7. Ntire 2023 challenge on 360° omnidirectional image and video super-resolution: Datasets, methods and results. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1731–1745, 2023.
  8. Maskgit: Masked generative image transformer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11305–11315, 2022.
  9. Y. Chen and X. Wang. Transformers as meta-learners for implicit neural representations. In European Conference on Computer Vision, 2022.
  10. Text2light: Zero-shot text-driven hdr panorama generation. ACM Trans. Graph., 41:195:1–195:16, 2022.
  11. Inout: Diverse image outpainting via gan inversion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11421–11430, 2022.
  12. Tangent images for mitigating spherical distortion. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12423–12431, 2019.
  13. A. A. Efros and T. Leung. Texture synthesis by non-parametric sampling. international conference on computer vision, 1999.
  14. Taming transformers for high-resolution image synthesis. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12868–12878, 2021.
  15. Se(3)-transformers: 3d roto-translation equivariant attention networks. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2020.
  16. Fourier space losses for efficient perceptual image super-resolution. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2340–2349, 2021.
  17. Swagan: A style-based wavelet-driven generative model. ACM Transactions on Graphics (TOG), 40(4):1–11, 2021.
  18. Generative adversarial nets. In NIPS, 2014.
  19. Piinet: A 360-degree panoramic image inpainting network using a cube map. ArXiv, abs/2010.16003, 2020.
  20. Spherical image generation from a single image by considering scene symmetry. In AAAI, 2021.
  21. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017.
  22. Augmented reality image generation with optical consistency using generative adversarial networks. 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 615–616, 2020.
  23. Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976, 2016.
  24. Focal frequency loss for image reconstruction and synthesis. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13899–13909, 2021.
  25. Sun-sky model estimation from outdoor images. Journal of Ambient Intelligence and Humanized Computing, 13:5151 – 5162, 2020.
  26. S. Jung and M. Keuper. Spectral distribution aware image generation. In Proceedings of the AAAI conference on artificial intelligence, number 2, pp. 1734–1742, 2021.
  27. Painting outside as inside: Edge guided image outpainting via bidirectional rearrangement with progressive step learning. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2121–2129, 2021.
  28. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
  29. D. P. Kingma and M. Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2014.
  30. Bullet comments for 360°video. 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1–10, 2022.
  31. Cylin-painting: Seamless 360° panoramic image outpainting and beyond with cylinder-style convolutions. ArXiv, abs/2204.08563, 2022.
  32. Lighting estimation via differentiable screen-space rendering. 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 575–576, 2020.
  33. Spectral regularization for combating mode collapse in gans. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6382–6390, 2019.
  34. Least squares generative adversarial networks. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2813–2821, 2016.
  35. T. Marrinan and M. E. Papka. Real-time omnidirectional stereo rendering: Generating 360° surround-view panoramic images for comfortable immersive viewing. IEEE Transactions on Visualization and Computer Graphics, 27:2587–2596, 2021.
  36. Scangan360: A generative model of realistic scanpaths for 360° images. IEEE Transactions on Visualization and Computer Graphics, 28:2003–2013, 2022.
  37. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  38. M. Mirza and S. Osindero. Conditional generative adversarial nets. ArXiv, abs/1411.1784, 2014.
  39. On aliased resizing and surprising subtleties in gan evaluation. In CVPR, 2022.
  40. Context encoders: Feature learning by inpainting. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544, 2016.
  41. Generating diverse structure for image inpainting with hierarchical vq-vae. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10770–10779, 2021.
  42. Photorealistic rendering for augmented reality: A global illumination and brdf solution. 2010 IEEE Virtual Reality Conference (VR), pp. 3–10, 2010.
  43. On the spectral bias of neural networks. In International Conference on Machine Learning, 2018.
  44. R. Ramamoorthi. Analytic pca construction for theoretical analysis of lighting variability in images of a lambertian object. IEEE Trans. Pattern Anal. Mach. Intell., 24:1322–1333, 2002.
  45. R. Ramamoorthi and P. Hanrahan. A signal-processing framework for reflection. ACM Transactions on Graphics (TOG), 23:1004 – 1042, 2004.
  46. High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10674–10685, 2021.
  47. 360 panorama synthesis from a sparse set of images with unknown field of view. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2375–2384, 2020.
  48. Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Processing Letters, 24:1408–1412, 2017.
  49. Fourier features let networks learn high frequency functions in low dimensional domains. In NeurIPS, 2020.
  50. Neural discrete representation learning. In NIPS, pp. 6306–6315, 2017.
  51. Attention is all you need. In NIPS, 2017.
  52. A. Vermast and W. Hürst. Introducing 3d thumbnails to access 360-degree videos in virtual reality. IEEE Transactions on Visualization and Computer Graphics, 29:2547–2556, 2023.
  53. An uncertain future: Forecasting from static images using variational autoencoders. ArXiv, abs/1606.07873, 2016.
  54. A comparison of visual attention guiding approaches for 360° image-based vr tours. 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 83–91, 2020.
  55. 360-degree panorama generation from few unregistered nfov images. ArXiv, abs/2308.14686, 2023.
  56. Bidirectional shadow rendering for interactive mixed 360° videos. 2021 IEEE Virtual Reality and 3D User Interfaces (VR), pp. 170–178, 2021.
  57. Biggerpicture. ACM Transactions on Graphics (TOG), 33:1 – 13, 2014.
  58. Transitioning360: Content-aware nfov virtual camera paths for 360° video playback. 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 185–194, 2020.
  59. High-fidelity gan inversion for image attribute editing. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11369–11378, 2022.
  60. Wide-context semantic image extrapolation. computer vision and pattern recognition, 2019.
  61. Ipo-ldm: Depth-aided 360-degree indoor rgb panorama outpainting via latent diffusion model. ArXiv, abs/2307.03177, 2023.
  62. Recognizing scene viewpoint using panoramic place representation. computer vision and pattern recognition, 2012.
  63. Fast and accurate spherical harmonics products. ACM Transactions on Graphics (TOG), 40:1 – 14, 2021.
  64. Real-time illumination estimation for mixed reality on mobile devices. 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 703–704, 2020.
  65. Scene graph expansion for semantics-guided image outpainting. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15596–15605, 2022.
  66. Diverse image inpainting with bidirectional and autoregressive transformers. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 69–78, 2021.
  67. The unreasonable effectiveness of deep features as a perceptual metric. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586–595, 2018.
  68. Uctgan: Diverse image inpainting based on unsupervised cross-space translation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5740–5749, 2020.
  69. Pluralistic image completion. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1438–1447, 2019.
  70. The prediction of saliency map for head and eye movements in 360 degree images. IEEE Transactions on Multimedia, 22:2331–2344, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets