Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CenterDisks: Real-time instance segmentation with disk covering (2403.03296v1)

Published 5 Mar 2024 in cs.CV

Abstract: Increasing the accuracy of instance segmentation methods is often done at the expense of speed. Using coarser representations, we can reduce the number of parameters and thus obtain real-time masks. In this paper, we take inspiration from the set cover problem to predict mask approximations. Given ground-truth binary masks of objects of interest as training input, our method learns to predict the approximate coverage of these objects by disks without supervision on their location or radius. Each object is represented by a fixed number of disks with different radii. In the learning phase, we consider the radius as proportional to a standard deviation in order to compute the error to propagate on a set of two-dimensional Gaussian functions rather than disks. We trained and tested our instance segmentation method on challenging datasets showing dense urban settings with various road users. Our method achieve state-of-the art results on the IDD and KITTI dataset with an inference time of 0.040 s on a single RTX 3090 GPU.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961–2969. [Online]. Available: https://openaccess.thecvf.com/content_iccv_2017/html/He_Mask_R-CNN_ICCV_2017_paper.html
  2. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2018/html/Liu_Path_Aggregation_Network_CVPR_2018_paper.html
  3. E. Xie, P. Sun, X. Song, W. Wang, X. Liu, D. Liang, C. Shen, and P. Luo, “PolarMask: Single Shot Instance Segmentation With Polar Representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 193–12 202. [Online]. Available: https://openaccess.thecvf.com/content_CVPR_2020/html/Xie_PolarMask_Single_Shot_Instance_Segmentation_With_Polar_Representation_CVPR_2020_paper.html
  4. H. Perreault, G.-A. Bilodeau, N. Saunier, and M. Héritier, “CenterPoly: Real-Time Instance Segmentation Using Bounding Polygons,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2982–2991. [Online]. Available: https://openaccess.thecvf.com/content/ICCV2021W/AVVision/html/Perreault_CenterPoly_Real-Time_Instance_Segmentation_Using_Bounding_Polygons_ICCVW_2021_paper.html
  5. P. Hurtik, V. Molek, J. Hula, M. Vajgl, P. Vlasanek, and T. Nejezchleba, “Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3,” Neural Computing and Applications, vol. 34, no. 10, pp. 8275–8290, May 2022. [Online]. Available: https://doi.org/10.1007/s00521-021-05978-9
  6. S. Peng, W. Jiang, H. Pi, X. Li, H. Bao, and X. Zhou, “Deep Snake for Real-Time Instance Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Apr. 2020, arXiv:2001.01629 [cs] type: article. [Online]. Available: http://arxiv.org/abs/2001.01629
  7. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment Anything,” Apr. 2023, arXiv:2304.02643 [cs]. [Online]. Available: http://arxiv.org/abs/2304.02643
  8. X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “SOLOv2: Dynamic and Fast Instance Segmentation,” in Advances in Neural Information Processing Systems, vol. 33.   Curran Associates, Inc., 2020, pp. 17 721–17 732. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/cd3afef9b8b89558cd56638c3631868a-Abstract.html
  9. T. Cheng, X. Wang, S. Chen, W. Zhang, Q. Zhang, C. Huang, Z. Zhang, and W. Liu, “Sparse Instance Activation for Real-Time Instance Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Mar. 2022. [Online]. Available: https://openaccess.thecvf.com/content/CVPR2022/html/Cheng_Sparse_Instance_Activation_for_Real-Time_Instance_Segmentation_CVPR_2022_paper.html
  10. D. Mazzini and R. Schettini, “Spatial Sampling Network for Fast Scene Understanding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0. [Online]. Available: https://openaccess.thecvf.com/content_CVPRW_2019/html/WAD/Mazzini_Spatial_Sampling_Network_for_Fast_Scene_Understanding_CVPRW_2019_paper.html
  11. W. Xu, H. Wang, F. Qi, and C. Lu, “Explicit Shape Encoding for Real-Time Instance Segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5168–5177. [Online]. Available: https://openaccess.thecvf.com/content_ICCV_2019/html/Xu_Explicit_Shape_Encoding_for_Real-Time_Instance_Segmentation_ICCV_2019_paper.html
  12. J. Uhrig, E. Rehder, B. Fröhlich, U. Franke, and T. Brox, “Box2Pix: Single-Shot Instance Segmentation by Assigning Pixels to Object Boxes,” in 2018 IEEE Intelligent Vehicles Symposium (IV), Jun. 2018, pp. 292–299. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8500621
  13. D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time Instance Segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019, arXiv: 1904.02689. [Online]. Available: http://arxiv.org/abs/1904.02689
  14. A. Newell, K. Yang, and J. Deng, “Stacked Hourglass Networks for Human Pose Estimation,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., 2016, pp. 483–499. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-319-46484-8_29
  15. L. Castrejon, K. Kundu, R. Urtasun, and S. Fidler, “Annotating Object Instances With a Polygon-RNN,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5230–5238. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2017/html/Castrejon_Annotating_Object_Instances_CVPR_2017_paper.html
  16. D. Acuna, H. Ling, A. Kar, and S. Fidler, “Efficient Interactive Annotation of Segmentation Datasets With Polygon-RNN++,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 859–868. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2018/html/Acuna_Efficient_Interactive_Annotation_CVPR_2018_paper.html
  17. K. Jodogne-Del Litto and G.-A. Bilodeau, “Real-time instance segmentation with polygons using an Intersection-over-Union loss,” May 2023, arXiv:2305.05490 [cs]. [Online]. Available: http://arxiv.org/abs/2305.05490
  18. H. U. M. Riaz, N. Benbarka, and A. Zell, “FourierNet: Compact Mask Representation for Instance Segmentation Using Differentiable Shape Decoders,” in 2020 25th International Conference on Pattern Recognition (ICPR), Jan. 2021, pp. 7833–7840, iSSN: 1051-4651.
  19. G. Bahl, L. Daniel, and F. Lafarge, “SCR: Smooth Contour Regression with Geometric Priors,” arXiv, Tech. Rep. arXiv:2202.03784, Feb. 2022, arXiv:2202.03784 [cs] type: article. [Online]. Available: http://arxiv.org/abs/2202.03784
  20. Q.-L. Zhang and Y.-B. Yang, “A boundary-preserving conditional convolution network for instance segmentation,” Pattern Recognition Letters, vol. 163, pp. 1–9, Nov. 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865522002665
  21. B. R. Kang, H. Lee, K. Park, H. Ryu, and H. Y. Kim, “BshapeNet: Object detection and instance segmentation with bounding shape masks,” Pattern Recognition Letters, vol. 131, pp. 449–455, Mar. 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865520300350
  22. R. M. Karp, “Reducibility among Combinatorial Problems,” in Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations, held March 20–22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, and sponsored by the Office of Naval Research, Mathematics Program, IBM World Trade Corporation, and the IBM Research Mathematical Sciences Department, ser. The IBM Research Symposia Series, R. E. Miller, J. W. Thatcher, and J. D. Bohlinger, Eds.   Boston, MA: Springer US, 1972, pp. 85–103. [Online]. Available: https://doi.org/10.1007/978-1-4684-2001-2_9
  23. E. H. Neville, “On the Solution of Numerical Functional Equations: Illustrated by an Account of a Popular Puzzle and of its Solution,” Proceedings of the London Mathematical Society, vol. s2_14, no. 1, pp. 308–326, Jan. 1915. [Online]. Available: https://doi.org/10.1112/plms/s2_14.1.308
  24. C. T. Zahn, Jr., “Black box maximization of circular coverage,” Journal of Research of the National Bureau of Standards. Section B. Mathematics and Mathematical Physics, vol. 66B, pp. 181–216, 1962. [Online]. Available: https://mathscinet.ams.org/mathscinet-getitem?mr=164285
  25. A. Salhieh, J. Weinmann, M. Kochhal, and L. Schwiebert, “Power efficient topologies for wireless sensor networks,” in International Conference on Parallel Processing, 2001., Sep. 2001, pp. 156–163, iSSN: 0190-3918.
  26. Y. Xu, J. Peng, W. Wang, and B. Zhu, “The connected disk covering problem,” Journal of Combinatorial Optimization, vol. 35, no. 2, pp. 538–554, Feb. 2018. [Online]. Available: https://doi.org/10.1007/s10878-017-0195-0
  27. E. Horster and R. Lienhart, “Approximating Optimal Visual Sensor Placement,” in 2006 IEEE International Conference on Multimedia and Expo, Jul. 2006, pp. 1257–1260, iSSN: 1945-788X.
  28. V. P. Munishwar and N. B. Abu-Ghazaleh, “Coverage algorithms for visual sensor networks,” ACM Transactions on Sensor Networks, vol. 9, no. 4, pp. 45:1–45:36, Jul. 2013. [Online]. Available: https://dl.acm.org/doi/10.1145/2489253.2489262
  29. J. K. Han, B. S. Park, Y. S. Choi, and H. K. Park, “Genetic approach with a new representation for base station placement in mobile communications,” in IEEE 54th Vehicular Technology Conference. VTC Fall 2001. Proceedings (Cat. No.01CH37211), vol. 4, Oct. 2001, pp. 2703–2707 vol.4. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/957251
  30. X. Zhou, D. Wang, and P. Krähenbühl, “Objects as Points,” arXiv:1904.07850 [cs], Apr. 2019. [Online]. Available: http://arxiv.org/abs/1904.07850
  31. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988. [Online]. Available: https://openaccess.thecvf.com/content_iccv_2017/html/Lin_Focal_Loss_for_ICCV_2017_paper.html
  32. M. Yi-de, L. Qing, and Q. Zhi-bai, “Automated image segmentation using improved PCNN model based on cross-entropy,” in Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004., Oct. 2004, pp. 743–746.
  33. F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV), Oct. 2016, pp. 565–571.
  34. D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Cartographica: The International Journal for Geographic Information and Geovisualization, vol. 10, no. 2, pp. 112–122, Dec. 1973, publisher: University of Toronto Press. [Online]. Available: https://www.utpjournals.press/doi/abs/10.3138/fm57-6770-u75u-7727
  35. S. Suzuki and K. be, “Topological structural analysis of digitized binary images by border following,” Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp. 32–46, Apr. 1985. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0734189X85900167
  36. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 3213–3223. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2016/html/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.html
  37. G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan. 2019, pp. 1743–1751.
  38. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 3354–3361, iSSN: 1063-6919.
  39. T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft COCO: Common Objects in Context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014, pp. 740–755. [Online]. Available: http://arxiv.org/abs/1405.0312
  40. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems, vol. 32.   Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
  41. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in 3rd International Conference on Learning Representations, {ICLR} 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. [Online]. Available: http://arxiv.org/abs/1412.6980
  42. J. Liang, N. Homayounfar, W.-C. Ma, Y. Xiong, R. Hu, and R. Urtasun, “PolyTransform: Deep Polygon Transformer for Instance Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9131–9140. [Online]. Available: https://openaccess.thecvf.com/content_CVPR_2020/html/Liang_PolyTransform_Deep_Polygon_Transformer_for_Instance_Segmentation_CVPR_2020_paper.html

Summary

We haven't generated a summary for this paper yet.