CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction (2304.07072v4)
Abstract: Structured reconstruction is a non-trivial dense prediction problem, which extracts structural information (\eg, building corners and edges) from a raster image, then reconstructs it to a 2D planar graph accordingly. Compared with common segmentation or detection problems, it significantly relays on the capability that leveraging holistic geometric information for structural reasoning. Current transformer-based approaches tackle this challenging problem in a two-stage manner, which detect corners in the first model and classify the proposed edges (corner-pairs) in the second model. However, they separate two-stage into different models and only share the backbone encoder. Unlike the existing modeling strategies, we present an enhanced corner representation method: 1) It fuses knowledge between the corner detection and edge prediction by sharing feature in different granularity; 2) Corner candidates are proposed in four heatmap channels w.r.t its direction. Both qualitative and quantitative evaluations demonstrate that our proposed method can better reconstruct fine-grained structures, such as adjacent corners and tiny edges. Consequently, it outperforms the state-of-the-art model by +1.9\%@F-1 on Corner and +3.0\%@F-1 on Edge.
- Efficient interactive annotation of segmentation datasets with polygon-rnn++. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 859–868, 2018.
- Multiway cut for stereo and motion with slanted surfaces. In Proceedings of the seventh IEEE international conference on computer vision, volume 1, pages 489–495. IEEE, 1999.
- Piecewise planar and compact floorplan reconstruction from images. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 628–635. IEEE, 2014.
- End-to-end object detection with transformers. ArXiv, abs/2005.12872, 2020.
- Annotating object instances with a polygon-rnn. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5230–5238, 2017.
- Floor-sp: Inverse cad for floorplans by sequential room-wise shortest path. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2661–2670, 2019.
- Heat: Holistic edge attention transformer for structured reconstruction. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3856–3865, 2022.
- Manhattan-world stereo. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1422–1429. IEEE, 2009.
- Piecewise planar and non-planar stereo for urban scene reconstruction. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 1418–1425. IEEE, 2010.
- Mask r-cnn. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42:386–397, 2020.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Recovering the spatial layout of cluttered rooms. In 2009 IEEE 12th international conference on computer vision (ICCV), pages 1849–1856. IEEE, 2009.
- Learning to parse wireframes in images of man-made environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 626–635, 2018.
- Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2015.
- Human pose regression with residual log-likelihood estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11025–11034, 2021.
- Localization with sampling-argmax. Advances in Neural Information Processing Systems, 34:27236–27248, 2021.
- Raster-to-vector: Revisiting floorplan transformation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2195–2203, 2017.
- Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329, 2022.
- 2d/3d pose estimation and action recognition using multitask deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5137–5146, 2018.
- Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3651–3660, 2021.
- Ganet: genetic algorithm platform for pipe network optimisation. Advances in engineering software, 32(6):467–475, 2001.
- Vectorizing world buildings: Planar graph reconstruction by primitive detection and relationship inference. In ECCV, 2020.
- Stacked hourglass networks for human pose estimation. In European conference on computer vision, pages 483–499. Springer, 2016.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems (NeurIPS), 28:91–99, 2015.
- Montefloor: Extending mcts for reconstructing accurate large-scale floor plans. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Joint training of a convolutional network and a graphical model for human pose estimation. Advances in neural information processing systems, 27, 2014.
- Attention is all you need. In NeurIPS, 2017.
- Convolutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 4724–4732, 2016.
- Line segment detection using transformers without edges. In CVPR, 2021.
- Holistically-attracted wireframe parsing. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2785–2794, 2020.
- Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10440–10450, 2021.
- Conv-mpn: Convolutional message passing neural network for structured outdoor architecture reconstruction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2795–2804, 2020.
- Structured outdoor architecture reconstruction by exploration and classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12427–12435, 2021.
- Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
- Ppgnet: Learning point-pair graph for line segment detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7105–7114, 2019.
- End-to-end wireframe parsing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 962–971, 2019.
- End-to-end wireframe parsing. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 962–971, 2019.
- Centerformer: Center-based transformer for 3d object detection. arXiv preprint arXiv:2209.05588, 2022.
- Deformable detr: Deformable transformers for end-to-end object detection. ArXiv, abs/2010.04159, 2021.