Polygon Detection for Room Layout Estimation using Heterogeneous Graphs and Wireframes (2306.12203v1)
Abstract: This paper presents a neural network based semantic plane detection method utilizing polygon representations. The method can for example be used to solve room layout estimations tasks. The method is built on, combines and further develops several different modules from previous research. The network takes an RGB image and estimates a wireframe as well as a feature space using an hourglass backbone. From these, line and junction features are sampled. The lines and junctions are then represented as an undirected graph, from which polygon representations of the sought planes are obtained. Two different methods for this last step are investigated, where the most promising method is built on a heterogeneous graph transformer. The final output is in all cases a projection of the semantic planes in 2D. The methods are evaluated on the Structured 3D dataset and we investigate the performance both using sampled and estimated wireframes. The experiments show the potential of the graph-based method by outperforming state of the art methods in Room Layout estimation in the 2D metrics using synthetic wireframe detections.
- Manhattan world: compass direction from a single image by bayesian inference. In Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 2, pages 941–947 vol.2, 1999.
- Bayesian geometric modeling of indoor scenes. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2719–2726, Oct. 2012. 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012 ; Conference date: 16-06-2012 Through 21-06-2012.
- Edsger W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959.
- Semantic room wireframe detection from a single view. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 1886–1893, 2022.
- Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017.
- Recovering the spatial layout of cluttered rooms. In 2009 IEEE 12th international conference on computer vision, pages 1849–1856. IEEE, 2009.
- Thinking outside the box: Generation of unconstrained 3d room layouts. In ACCV, 2018.
- Heterogeneous graph transformer. In Proceedings of The Web Conference 2020, WWW ’20, page 2704–2710, New York, NY, USA, 2020. Association for Computing Machinery.
- Richard M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Mathematics, 23(3):309–311, 1978.
- Double refinement network for room layout estimation. In Pattern Recognition, pages 557–568, Cham, 2020. Springer International Publishing.
- Boon Chai Lee. Algorithmic approaches to circuit enumeration problems and applications. Technical report, Cambridge, Mass.: Massachusetts Institute of Technology, Dept. of …, 1982.
- Roomnet: End-to-end room layout estimation, 2017.
- Indoor scene layout estimation from a single image. In 2018 24th International Conference on Pattern Recognition (ICPR), 2018.
- Deeproom: 3d room layout and pose estimation from a single image. In Shivakumara Palaiahnakote, Gabriella Sanniti di Baja, Liang Wang, and Wei Qi Yan, editors, Pattern Recognition, pages 719–733, Cham, 2020. Springer International Publishing.
- Planercnn: 3d plane detection and reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Planenet: Piece-wise planar reconstruction from a single rgb image. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2579–2588. IEEE Computer Society, 2018.
- Learning informative edge maps for indoor scene layout prediction. In 2015 International Conference on Computer Vision, ICCV 2015, Proceedings of the IEEE International Conference on Computer Vision, pages 936–944, United States, Feb. 2015. Institute of Electrical and Electronics Engineers Inc. 15th IEEE International Conference on Computer Vision, ICCV 2015 ; Conference date: 11-12-2015 Through 18-12-2015.
- LGNN: A Context-Aware Line Segment Detector, page 4364–4372. Association for Computing Machinery, New York, NY, USA, 2020.
- Stacked hourglass networks for human pose estimation. In Computer Vision - 14th European Conference, ECCV 2016, Proceedings, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 483–499, Germany, 2016. Springer Verlag.
- A coarse-to-fine indoor layout estimation (cfile) method. In Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato, editors, Computer Vision – ACCV 2016, pages 36–51, Cham, 2017. Springer International Publishing.
- Efficient structured prediction for 3d indoor scene understanding. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2815–2822, 2012.
- General 3d room layout from a single view by render-and-compare. In Computer Vision – ECCV 2020, pages 187–203, Cham, 2020. Springer International Publishing.
- Planetr: Structure-guided transformers for 3d plane recovery. In International Conference on Computer Vision, 2021.
- Accurate junction detection and characterization in natural images. International journal of computer vision, 106(1):31–56, 2014.
- Learning attraction field representation for robust line segment detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1595–1603, 2019.
- Learning regional attraction for line segment detection. IEEE transactions on pattern analysis and machine intelligence, 2019.
- Holistically-attracted wireframe parsing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- 3d room layout estimation from a single rgb image. IEEE Transactions on Multimedia, 22(11):3014–3024, 2020.
- Learning to reconstruct 3d non-cuboid room layout from a single rgb image. In WACV, 2022.
- Recovering 3d planes from a single image via convolutional neural networks. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV 2018, pages 87–103. Springer International Publishing, 2018.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop, 2016.
- Single-image piece-wise planar 3d reconstruction via associative embedding. In CVPR, pages 1029–1037, 2019.
- Edge-semantic learning strategy for layout estimation in indoor environment. CoRR, abs/1901.00621, 2019.
- Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation, 2017.
- Structured3d: A large photo-realistic dataset for structured 3d modeling. In Proceedings of The European Conference on Computer Vision (ECCV), 2020.
- End-to-end wireframe parsing. In ICCV 2019, 2019.