PanoDiffusion: 360-degree Panorama Outpainting via Diffusion (2307.03177v6)
Abstract: Generating complete 360-degree panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360-degree indoor RGB-D panorama outpainting model using latent diffusion models (LDM), called PanoDiffusion. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, which works surprisingly well to outpaint depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our PanoDiffusion not only significantly outperforms state-of-the-art methods on RGB-D panorama outpainting by producing diverse well-structured results for different types of masks, but can also synthesize high-quality depth panoramas to provide realistic 3D indoor models.
- 360-degree image completion by two-stage conditional gans. In 2019 IEEE International Conference on Image Processing (ICIP), pages 4704–4708. IEEE, 2019.
- Diverse plausible 360-degree image outpainting for efficient 3dcg background creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11441–11450, 2022.
- Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
- Robust reconstruction of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5556–5565, 2015.
- Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1033–1038. IEEE, 1999.
- Blobgan: Spatially disentangled scene representations. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, pages 616–635. Springer, 2022.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
- Deep parametric indoor lighting estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7175–7183, 2019.
- Learning to predict indoor illumination from a single image. ACM Transactions on Graphics (TOG), 36(6):1–14, 2017.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Spherical image generation from a single image by considering scene symmetry. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 1513–1521, 2021.
- Image completion approaches using the statistics of similar patches. IEEE transactions on pattern analysis and machine intelligence, 36(12):2423–2435, 2014.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- A structure-guided diffusion model for large-hole diverse image completion. arXiv preprint arXiv:2211.10437, 2022.
- Image fine-grained inpainting. arXiv preprint arXiv:2002.02609, 2020.
- Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017.
- Joint layout estimation and global multi-view registration for indoor reconstruction. In Proceedings of the IEEE international conference on computer vision, pages 162–171, 2017.
- Sdm: Spatial diffusion model for large hole image inpainting. arXiv preprint arXiv:2212.02963, 2022.
- Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 725–741. Springer, 2020.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
- Reliable fidelity and diversity metrics for generative models. In International Conference on Machine Learning, pages 7176–7185. PMLR, 2020.
- Generating images with sparse representations. arXiv preprint arXiv:2103.03841, 2021.
- Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality, pages 127–136. Ieee, 2011.
- Bips: Bi-modal indoor panorama synthesis via residual depth-aided adversarial learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI, pages 352–371. Springer, 2022.
- Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
- Rgb-(d) scene labeling: Features and algorithms. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2759–2766. IEEE, 2012.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
- Hdr environment map estimation for real-time augmented reality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11298–11306, 2021.
- Neural illumination: Lighting prediction for indoor environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6918–6926, 2019.
- Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022.
- Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
- Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1438–1447, 2019.
- Bridging global context interactions for high-fidelity image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11512–11522, 2022.
- Structured3d: A large photo-realistic dataset for structured 3d modeling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 519–535. Springer, 2020.