Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Windowed-FourierMixer: Enhancing Clutter-Free Room Modeling with Fourier Transform (2402.18287v1)

Published 28 Feb 2024 in cs.CV

Abstract: With the growing demand for immersive digital applications, the need to understand and reconstruct 3D scenes has significantly increased. In this context, inpainting indoor environments from a single image plays a crucial role in modeling the internal structure of interior spaces as it enables the creation of textured and clutter-free reconstructions. While recent methods have shown significant progress in room modeling, they rely on constraining layout estimators to guide the reconstruction process. These methods are highly dependent on the performance of the structure estimator and its generative ability in heavily occluded environments. In response to these issues, we propose an innovative approach based on a U-Former architecture and a new Windowed-FourierMixer block, resulting in a unified, single-phase network capable of effectively handle human-made periodic structures such as indoor spaces. This new architecture proves advantageous for tasks involving indoor scenes where symmetry is prevalent, allowing the model to effectively capture features such as horizon/ceiling height lines and cuboid-shaped rooms. Experiments show the proposed approach outperforms current state-of-the-art methods on the Structured3D dataset demonstrating superior performance in both quantitative metrics and qualitative results. Code and models will be made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Filling-in by joint interpolation of vector fields and gray levels. IEEE transactions on image processing, 10(8):1200–1211, 2001.
  2. Fast fourier convolution. Advances in Neural Information Processing Systems, 33:4479–4488, 2020.
  3. Generative adversarial networks: An overview. IEEE signal processing magazine, 35(1):53–65, 2018.
  4. Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Transactions on Image Processing, 28(4):1705–1719, 2018.
  5. Layout-guided indoor panorama inpainting with plane-aware normalization. In Asian Conference on Computer Vision, pages 425–441. Springer, 2022.
  6. Panodr: Spherical panorama diminished reality for indoor scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3716–3726, 2021.
  7. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  8. Scene completion using millions of photographs. ACM Transactions on Graphics (ToG), 26(3):4–es, 2007.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  10. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017.
  11. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  12. Keys to better image inpainting: Structure and texture go hand in hand. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 208–217, 2023.
  13. Lgt-net: Indoor panoramic room layout estimation with geometry-aware transformer network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1654–1663, 2022.
  14. Generative multiview inpainting for object removal in large indoor spaces. International Journal of Advanced Robotic Systems, 18(2):1729881421996544, 2021.
  15. Laplacian patch-based image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2727–2735, 2016.
  16. Single-image depth estimation based on fourier domain analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 330–339, 2018.
  17. Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824, 2021.
  18. Mat: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10758–10768, 2022.
  19. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV), pages 85–100, 2018.
  20. Coherent semantic attention for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4170–4179, 2019.
  21. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  22. Glama: Joint spatial and frequency loss for general image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1301–1310, 2022.
  23. Regionwise generative adversarial image inpainting for large missing areas. IEEE transactions on cybernetics, 2022.
  24. Real-time diminished reality for dynamic scenes. In 2015 IEEE International Symposium on Mixed and Augmented Reality Workshops, pages 53–59. IEEE, 2015.
  25. Efficient use of textured 3d model for pre-observation-based diminished reality. In 2015 IEEE international symposium on mixed and augmented Reality workshops, pages 32–39. IEEE, 2015.
  26. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212, 2019.
  27. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
  28. Towards mobile diminished reality. In 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 226–231. IEEE, 2018.
  29. Global filter networks for image classification. Advances in neural information processing systems, 34:980–993, 2021.
  30. Spg-net: Segmentation prediction and guidance network for image inpainting. arXiv preprint arXiv:1805.03356, 2018.
  31. Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1047–1056, 2019.
  32. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022.
  33. High-resolution image synthesis and semantic manipulation with conditional gans. CoRR, abs/1711.11585, 2017.
  34. Image inpainting with learnable bidirectional attention maps. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8858–8867, 2019.
  35. Image completion with heterogeneously filtered spectral hints. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4591–4601, 2023.
  36. Fda: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4085–4095, 2020.
  37. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
  38. Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5505–5514, 2018.
  39. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4471–4480, 2019.
  40. Region normalization for image inpainting. In Proceedings of the AAAI conference on artificial intelligence, pages 12733–12740, 2020.
  41. Metaformer baselines for vision. arXiv preprint arXiv:2210.13452, 2022.
  42. Diverse image inpainting with bidirectional and autoregressive transformers. In Proceedings of the 29th ACM International Conference on Multimedia, pages 69–78, 2021.
  43. Improving 360 monocular depth estimation via non-local dense prediction transformer and joint supervised and self-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3224–3233, 2022.
  44. Learning pyramid-context encoder network for high-quality image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1486–1494, 2019.
  45. Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428, 2021.
  46. Image inpainting with cascaded modulation gan and object-aware training. In European Conference on Computer Vision, pages 277–296. Springer, 2022.
  47. Structured3d: A large photo-realistic dataset for structured 3d modeling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 519–535. Springer, 2020.
  48. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 633–641, 2017.
  49. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5104–5113, 2020.
  50. Single-shot cuboids: Geodesics-based end-to-end manhattan aligned layout estimation from spherical panoramas. Image and Vision Computing, 110:104160, 2021.

Summary

We haven't generated a summary for this paper yet.