Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction (2304.07072v4)

Published 14 Apr 2023 in cs.CV and cs.AI

Abstract: Structured reconstruction is a non-trivial dense prediction problem, which extracts structural information (\eg, building corners and edges) from a raster image, then reconstructs it to a 2D planar graph accordingly. Compared with common segmentation or detection problems, it significantly relays on the capability that leveraging holistic geometric information for structural reasoning. Current transformer-based approaches tackle this challenging problem in a two-stage manner, which detect corners in the first model and classify the proposed edges (corner-pairs) in the second model. However, they separate two-stage into different models and only share the backbone encoder. Unlike the existing modeling strategies, we present an enhanced corner representation method: 1) It fuses knowledge between the corner detection and edge prediction by sharing feature in different granularity; 2) Corner candidates are proposed in four heatmap channels w.r.t its direction. Both qualitative and quantitative evaluations demonstrate that our proposed method can better reconstruct fine-grained structures, such as adjacent corners and tiny edges. Consequently, it outperforms the state-of-the-art model by +1.9\%@F-1 on Corner and +3.0\%@F-1 on Edge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Efficient interactive annotation of segmentation datasets with polygon-rnn++. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 859–868, 2018.
  2. Multiway cut for stereo and motion with slanted surfaces. In Proceedings of the seventh IEEE international conference on computer vision, volume 1, pages 489–495. IEEE, 1999.
  3. Piecewise planar and compact floorplan reconstruction from images. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 628–635. IEEE, 2014.
  4. End-to-end object detection with transformers. ArXiv, abs/2005.12872, 2020.
  5. Annotating object instances with a polygon-rnn. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5230–5238, 2017.
  6. Floor-sp: Inverse cad for floorplans by sequential room-wise shortest path. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2661–2670, 2019.
  7. Heat: Holistic edge attention transformer for structured reconstruction. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3856–3865, 2022.
  8. Manhattan-world stereo. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1422–1429. IEEE, 2009.
  9. Piecewise planar and non-planar stereo for urban scene reconstruction. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 1418–1425. IEEE, 2010.
  10. Mask r-cnn. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42:386–397, 2020.
  11. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  12. Recovering the spatial layout of cluttered rooms. In 2009 IEEE 12th international conference on computer vision (ICCV), pages 1849–1856. IEEE, 2009.
  13. Learning to parse wireframes in images of man-made environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 626–635, 2018.
  14. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2015.
  15. Human pose regression with residual log-likelihood estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11025–11034, 2021.
  16. Localization with sampling-argmax. Advances in Neural Information Processing Systems, 34:27236–27248, 2021.
  17. Raster-to-vector: Revisiting floorplan transformation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2195–2203, 2017.
  18. Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329, 2022.
  19. 2d/3d pose estimation and action recognition using multitask deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5137–5146, 2018.
  20. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3651–3660, 2021.
  21. Ganet: genetic algorithm platform for pipe network optimisation. Advances in engineering software, 32(6):467–475, 2001.
  22. Vectorizing world buildings: Planar graph reconstruction by primitive detection and relationship inference. In ECCV, 2020.
  23. Stacked hourglass networks for human pose estimation. In European conference on computer vision, pages 483–499. Springer, 2016.
  24. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems (NeurIPS), 28:91–99, 2015.
  25. Montefloor: Extending mcts for reconstructing accurate large-scale floor plans. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  26. Joint training of a convolutional network and a graphical model for human pose estimation. Advances in neural information processing systems, 27, 2014.
  27. Attention is all you need. In NeurIPS, 2017.
  28. Convolutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 4724–4732, 2016.
  29. Line segment detection using transformers without edges. In CVPR, 2021.
  30. Holistically-attracted wireframe parsing. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2785–2794, 2020.
  31. Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10440–10450, 2021.
  32. Conv-mpn: Convolutional message passing neural network for structured outdoor architecture reconstruction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2795–2804, 2020.
  33. Structured outdoor architecture reconstruction by exploration and classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12427–12435, 2021.
  34. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
  35. Ppgnet: Learning point-pair graph for line segment detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7105–7114, 2019.
  36. End-to-end wireframe parsing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 962–971, 2019.
  37. End-to-end wireframe parsing. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 962–971, 2019.
  38. Centerformer: Center-based transformer for 3d object detection. arXiv preprint arXiv:2209.05588, 2022.
  39. Deformable detr: Deformable transformers for end-to-end object detection. ArXiv, abs/2010.04159, 2021.

Summary

We haven't generated a summary for this paper yet.