Papers
Topics
Authors
Recent
Search
2000 character limit reached

Symbol as Points: Panoptic Symbol Spotting via Point-based Representation

Published 19 Jan 2024 in cs.CV and cs.GR | (2401.10556v1)

Abstract: This work studies the problem of panoptic symbol spotting, which is to spot and parse both countable object instances (windows, doors, tables, etc.) and uncountable stuff (wall, railing, etc.) from computer-aided design (CAD) drawings. Existing methods typically involve either rasterizing the vector graphics into images and using image-based methods for symbol spotting, or directly building graphs and using graph neural networks for symbol recognition. In this paper, we take a different approach, which treats graphic primitives as a set of 2D points that are locally connected and use point cloud segmentation methods to tackle it. Specifically, we utilize a point transformer to extract the primitive features and append a mask2former-like spotting head to predict the final output. To better use the local connection information of primitives and enhance their discriminability, we further propose the attention with connection module (ACM) and contrastive connection learning scheme (CCL). Finally, we propose a KNN interpolation mechanism for the mask attention module of the spotting head to better handle primitive mask downsampling, which is primitive-level in contrast to pixel-level for the image. Our approach, named SymPoint, is simple yet effective, outperforming recent state-of-the-art method GAT-CADNet by an absolute increase of 9.6% PQ and 10.4% RQ on the FloorPlanCAD dataset. The source code and models will be available at https://github.com/nicehuster/SymPoint.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
  2. Deepsvg: A hierarchical generative network for vector graphics animation. Advances in Neural Information Processing Systems, 33:16351–16361, 2020.
  3. Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, 34:17864–17875, 2021.
  4. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1290–1299, 2022.
  5. Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE transactions on pattern analysis and machine intelligence, 17(8):790–799, 1995.
  6. Learning to predict crisp boundaries. In Proceedings of the European conference on computer vision (ECCV), pp.  562–578, 2018.
  7. Floorplancad: A large-scale cad drawing dataset for panoptic symbol spotting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10128–10137, 2021.
  8. Cadtransformer: Panoptic symbol spotting transformer for cad drawings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10986–10996, 2022.
  9. Analyzing and improving representations with the soft nearest neighbor loss. In International conference on machine learning, pp. 2012–2020. PMLR, 2019.
  10. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.  297–304. JMLR Workshop and Conference Proceedings, 2010.
  11. Lidar-based panoptic segmentation via dynamic shifting network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13090–13099, 2021.
  12. Recognizing vector graphics without rasterization. Advances in Neural Information Processing Systems, 34:24569–24580, 2021.
  13. Panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9404–9413, 2019.
  14. Panoptic-phnet: Towards real-time and high-precision lidar panoptic segmentation via clustering pseudo heatmap. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11809–11818, 2022.
  15. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp.  2980–2988, 2017.
  16. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  17. Isbnet: a 3d point cloud instance segmentation network with instance-aware sampling and box-aware dynamic convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13550–13559, 2023.
  18. A symbol spotting approach based on the vector model and a visual vocabulary. In 2009 10th International Conference on Document Analysis and Recognition, pp.  708–712. IEEE, 2009.
  19. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  20. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  652–660, 2017a.
  21. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017b.
  22. Im2vec: Synthesizing vector graphics without vector supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7342–7351, 2021.
  23. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  24. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  25. Symbol spotting for architectural drawings: state-of-the-art and new industry-driven developments. IPSJ Transactions on Computer Vision and Applications, 11:1–22, 2019.
  26. Symbol spotting on digital architectural floor plans using a deep learning-based framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  568–569, 2020.
  27. Mask3d: Mask transformer for 3d semantic instance segmentation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  8216–8223. IEEE, 2023.
  28. Rendnet: Unified 2d/3d recognizer with latent space rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5408–5417, 2022.
  29. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  5693–5703, 2019.
  30. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  9627–9636, 2019.
  31. Position-guided point cloud panoptic segmentation transformer. arXiv preprint arXiv:2303.13509, 2023.
  32. Vectorfloorseg: Two-stream graph attention network for vectorized roughcast floorplan segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1358–1367, 2023.
  33. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
  34. Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  16259–16268, 2021.
  35. Gat-cadnet: Graph attention network for panoptic symbol spotting in cad drawings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11747–11756, 2022.
  36. Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13194–13203, 2021.
Citations (4)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.