Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction (2311.18695v1)
Abstract: State-of-the-art single-view 360-degree room layout reconstruction methods formulate the problem as a high-level 1D (per-column) regression task. On the other hand, traditional low-level 2D layout segmentation is simpler to learn and can represent occluded regions, but it requires complex post-processing for the targeting layout polygon and sacrifices accuracy. We present Seg2Reg to render 1D layout depth regression from the 2D segmentation map in a differentiable and occlusion-aware way, marrying the merits of both sides. Specifically, our model predicts floor-plan density for the input equirectangular 360-degree image. Formulating the 2D layout representation as a density field enables us to employ `flattened' volume rendering to form 1D layout depth regression. In addition, we propose a novel 3D warping augmentation on layout to improve generalization. Finally, we re-implement recent room layout reconstruction methods into our codebase for benchmarking and explore modern backbones and training techniques to serve as the strong baseline. Our model significantly outperforms previous arts. The code will be made available upon publication.
- Joint 2d-3d-semantic data for indoor scene understanding. CoRR, abs/1702.01105, 2017.
- Matterport3d: Learning from RGB-D data in indoor environments. In 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017, pages 667–676. IEEE Computer Society, 2017.
- Tensorf: Tensorial radiance fields. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXII, pages 333–350. Springer, 2022.
- Zillow indoor dataset: Annotated floor plans with 360deg panoramas and 3d room layouts. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 2133–2143. Computer Vision Foundation / IEEE, 2021.
- Autoaugment: Learning augmentation strategies from data. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 113–123. Computer Vision Foundation / IEEE, 2019.
- Randaugment: Practical automated data augmentation with a reduced search space. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020, pages 3008–3017. Computer Vision Foundation / IEEE, 2020.
- Supermix: Supervising the mixing data augmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 13794–13803. Computer Vision Foundation / IEEE, 2021.
- Corners for layout: End-to-end layout recovery from 360 images. IEEE Robotics Autom. Lett., 5(2):1255–1262, 2020.
- Plenoxels: Radiance fields without neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 5491–5500. IEEE, 2022.
- Layout-guided indoor panorama inpainting with plane-aware normalization. In Computer Vision - ACCV 2022 - 16th Asian Conference on Computer Vision, Macao, China, December 4-8, 2022, Proceedings, Part VI, pages 425–441. Springer, 2022.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society, 2016.
- Augment your batch: Improving generalization through instance repetition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 8126–8135. Computer Vision Foundation / IEEE, 2020.
- Panomixswap panorama mixing via structural swapping for indoor scene understanding. CoRR, abs/2309.09514, 2023.
- Covispose: Co-visibility pose transformer for wide-baseline relative pose estimation in 360$^\circ $ indoor panoramas. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXII, pages 615–633. Springer, 2022.
- IM2CAD. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2422–2431. IEEE Computer Society, 2017.
- Averaging weights leads to wider optima and better generalization. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pages 876–885. AUAI Press, 2018.
- Putting nerf on a diet: Semantically consistent few-shot view synthesis. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5865–5874. IEEE, 2021.
- Lgt-net: Indoor panoramic room layout estimation with geometry-aware transformer network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 1644–1653. IEEE, 2022.
- Ray tracing volume densities. In Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1984, Minneapolis, Minnesota, USA, July 23-27, 1984, pages 165–174. ACM, 1984.
- Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Roomnet: End-to-end room layout estimation. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 4875–4884. IEEE Computer Society, 2017.
- Openrooms: An open framework for photorealistic indoor scene datasets. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 7190–7199. Computer Vision Foundation / IEEE, 2021.
- Fast autoaugment. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 6662–6672, 2019.
- Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 9992–10002. IEEE, 2021.
- A convnet for the 2020s. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 11966–11976. IEEE, 2022.
- Nelson L. Max. Optical models for direct volume rendering. IEEE Trans. Vis. Comput. Graph., 1(2):99–108, 1995.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, pages 405–421. Springer, 2020.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
- Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 5470–5480. IEEE, 2022.
- UNISURF: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5569–5579. IEEE, 2021.
- Atlantanet: Inferring the 3d indoor layout from a single $360^\circ $ image beyond the manhattan world assumption. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VIII, pages 432–448. Springer, 2020.
- Bidirectional recurrent neural networks. In IEEE Transactions on Signal Processing, 1997.
- Extreme structure from motion for indoor panoramas without visual overlaps. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5683–5691. IEEE, 2021.
- 360-dfpe: Leveraging monocular 360-layouts for direct floor plan estimation. IEEE Robotics Autom. Lett., 7(3):6503–6510, 2022.
- Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 1047–1056. Computer Vision Foundation / IEEE, 2019.
- Hohonet: 360 indoor holistic understanding with latent horizontal features. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 2573–2582. Computer Vision Foundation / IEEE, 2021.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 5449–5459. IEEE, 2022.
- Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 5998–6008, 2017.
- Led2-net: Monocular 360deg layout estimation via differentiable depth rendering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 12956–12965. Computer Vision Foundation / IEEE, 2021a.
- Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell., 43(10):3349–3364, 2021b.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 27171–27183, 2021c.
- Segformer: Simple and efficient design for semantic segmentation with transformers. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 12077–12090, 2021.
- Layout-guided novel view synthesis from a single indoor panorama. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 16438–16447. Computer Vision Foundation / IEEE, 2021.
- Freenerf: Improving few-shot neural rendering with free frequency regularization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 8254–8263. IEEE, 2023.
- Dula-net: A dual-projection network for estimating room layouts from a single RGB panorama. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 3363–3372. Computer Vision Foundation / IEEE, 2019.
- Volume rendering of neural implicit surfaces. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 4805–4815, 2021.
- Photoscene: Photorealistic material and lighting transfer for indoor scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 18541–18550. IEEE, 2022.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 6022–6031. IEEE, 2019.
- Deeppanocontext: Panoramic 3d scene understanding with holistic scene context graph and relation-based optimization. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 12612–12621. IEEE, 2021.
- mixup: Beyond empirical risk minimization. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Panocontext: A whole-room 3d context model for panoramic scene understanding. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI, pages 668–686. Springer, 2014.
- Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 870–878. IEEE Computer Society, 2017.
- Random erasing data augmentation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 13001–13008. AAAI Press, 2020.
- Layoutnet: Reconstructing the 3d room layout from a single RGB image. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 2051–2059. Computer Vision Foundation / IEEE Computer Society, 2018.
- Manhattan room layout reconstruction from a single $360^{\circ }$ image: A comparative study of state-of-the-art methods. Int. J. Comput. Vis., 129(5):1410–1431, 2021.