CenterDisks: Real-time instance segmentation with disk covering (2403.03296v1)
Abstract: Increasing the accuracy of instance segmentation methods is often done at the expense of speed. Using coarser representations, we can reduce the number of parameters and thus obtain real-time masks. In this paper, we take inspiration from the set cover problem to predict mask approximations. Given ground-truth binary masks of objects of interest as training input, our method learns to predict the approximate coverage of these objects by disks without supervision on their location or radius. Each object is represented by a fixed number of disks with different radii. In the learning phase, we consider the radius as proportional to a standard deviation in order to compute the error to propagate on a set of two-dimensional Gaussian functions rather than disks. We trained and tested our instance segmentation method on challenging datasets showing dense urban settings with various road users. Our method achieve state-of-the art results on the IDD and KITTI dataset with an inference time of 0.040 s on a single RTX 3090 GPU.
- K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961–2969. [Online]. Available: https://openaccess.thecvf.com/content_iccv_2017/html/He_Mask_R-CNN_ICCV_2017_paper.html
- S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2018/html/Liu_Path_Aggregation_Network_CVPR_2018_paper.html
- E. Xie, P. Sun, X. Song, W. Wang, X. Liu, D. Liang, C. Shen, and P. Luo, “PolarMask: Single Shot Instance Segmentation With Polar Representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 193–12 202. [Online]. Available: https://openaccess.thecvf.com/content_CVPR_2020/html/Xie_PolarMask_Single_Shot_Instance_Segmentation_With_Polar_Representation_CVPR_2020_paper.html
- H. Perreault, G.-A. Bilodeau, N. Saunier, and M. Héritier, “CenterPoly: Real-Time Instance Segmentation Using Bounding Polygons,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2982–2991. [Online]. Available: https://openaccess.thecvf.com/content/ICCV2021W/AVVision/html/Perreault_CenterPoly_Real-Time_Instance_Segmentation_Using_Bounding_Polygons_ICCVW_2021_paper.html
- P. Hurtik, V. Molek, J. Hula, M. Vajgl, P. Vlasanek, and T. Nejezchleba, “Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3,” Neural Computing and Applications, vol. 34, no. 10, pp. 8275–8290, May 2022. [Online]. Available: https://doi.org/10.1007/s00521-021-05978-9
- S. Peng, W. Jiang, H. Pi, X. Li, H. Bao, and X. Zhou, “Deep Snake for Real-Time Instance Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Apr. 2020, arXiv:2001.01629 [cs] type: article. [Online]. Available: http://arxiv.org/abs/2001.01629
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment Anything,” Apr. 2023, arXiv:2304.02643 [cs]. [Online]. Available: http://arxiv.org/abs/2304.02643
- X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “SOLOv2: Dynamic and Fast Instance Segmentation,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 17 721–17 732. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/cd3afef9b8b89558cd56638c3631868a-Abstract.html
- T. Cheng, X. Wang, S. Chen, W. Zhang, Q. Zhang, C. Huang, Z. Zhang, and W. Liu, “Sparse Instance Activation for Real-Time Instance Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Mar. 2022. [Online]. Available: https://openaccess.thecvf.com/content/CVPR2022/html/Cheng_Sparse_Instance_Activation_for_Real-Time_Instance_Segmentation_CVPR_2022_paper.html
- D. Mazzini and R. Schettini, “Spatial Sampling Network for Fast Scene Understanding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0. [Online]. Available: https://openaccess.thecvf.com/content_CVPRW_2019/html/WAD/Mazzini_Spatial_Sampling_Network_for_Fast_Scene_Understanding_CVPRW_2019_paper.html
- W. Xu, H. Wang, F. Qi, and C. Lu, “Explicit Shape Encoding for Real-Time Instance Segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5168–5177. [Online]. Available: https://openaccess.thecvf.com/content_ICCV_2019/html/Xu_Explicit_Shape_Encoding_for_Real-Time_Instance_Segmentation_ICCV_2019_paper.html
- J. Uhrig, E. Rehder, B. Fröhlich, U. Franke, and T. Brox, “Box2Pix: Single-Shot Instance Segmentation by Assigning Pixels to Object Boxes,” in 2018 IEEE Intelligent Vehicles Symposium (IV), Jun. 2018, pp. 292–299. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8500621
- D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time Instance Segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019, arXiv: 1904.02689. [Online]. Available: http://arxiv.org/abs/1904.02689
- A. Newell, K. Yang, and J. Deng, “Stacked Hourglass Networks for Human Pose Estimation,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., 2016, pp. 483–499. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-319-46484-8_29
- L. Castrejon, K. Kundu, R. Urtasun, and S. Fidler, “Annotating Object Instances With a Polygon-RNN,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5230–5238. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2017/html/Castrejon_Annotating_Object_Instances_CVPR_2017_paper.html
- D. Acuna, H. Ling, A. Kar, and S. Fidler, “Efficient Interactive Annotation of Segmentation Datasets With Polygon-RNN++,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 859–868. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2018/html/Acuna_Efficient_Interactive_Annotation_CVPR_2018_paper.html
- K. Jodogne-Del Litto and G.-A. Bilodeau, “Real-time instance segmentation with polygons using an Intersection-over-Union loss,” May 2023, arXiv:2305.05490 [cs]. [Online]. Available: http://arxiv.org/abs/2305.05490
- H. U. M. Riaz, N. Benbarka, and A. Zell, “FourierNet: Compact Mask Representation for Instance Segmentation Using Differentiable Shape Decoders,” in 2020 25th International Conference on Pattern Recognition (ICPR), Jan. 2021, pp. 7833–7840, iSSN: 1051-4651.
- G. Bahl, L. Daniel, and F. Lafarge, “SCR: Smooth Contour Regression with Geometric Priors,” arXiv, Tech. Rep. arXiv:2202.03784, Feb. 2022, arXiv:2202.03784 [cs] type: article. [Online]. Available: http://arxiv.org/abs/2202.03784
- Q.-L. Zhang and Y.-B. Yang, “A boundary-preserving conditional convolution network for instance segmentation,” Pattern Recognition Letters, vol. 163, pp. 1–9, Nov. 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865522002665
- B. R. Kang, H. Lee, K. Park, H. Ryu, and H. Y. Kim, “BshapeNet: Object detection and instance segmentation with bounding shape masks,” Pattern Recognition Letters, vol. 131, pp. 449–455, Mar. 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865520300350
- R. M. Karp, “Reducibility among Combinatorial Problems,” in Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations, held March 20–22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, and sponsored by the Office of Naval Research, Mathematics Program, IBM World Trade Corporation, and the IBM Research Mathematical Sciences Department, ser. The IBM Research Symposia Series, R. E. Miller, J. W. Thatcher, and J. D. Bohlinger, Eds. Boston, MA: Springer US, 1972, pp. 85–103. [Online]. Available: https://doi.org/10.1007/978-1-4684-2001-2_9
- E. H. Neville, “On the Solution of Numerical Functional Equations: Illustrated by an Account of a Popular Puzzle and of its Solution,” Proceedings of the London Mathematical Society, vol. s2_14, no. 1, pp. 308–326, Jan. 1915. [Online]. Available: https://doi.org/10.1112/plms/s2_14.1.308
- C. T. Zahn, Jr., “Black box maximization of circular coverage,” Journal of Research of the National Bureau of Standards. Section B. Mathematics and Mathematical Physics, vol. 66B, pp. 181–216, 1962. [Online]. Available: https://mathscinet.ams.org/mathscinet-getitem?mr=164285
- A. Salhieh, J. Weinmann, M. Kochhal, and L. Schwiebert, “Power efficient topologies for wireless sensor networks,” in International Conference on Parallel Processing, 2001., Sep. 2001, pp. 156–163, iSSN: 0190-3918.
- Y. Xu, J. Peng, W. Wang, and B. Zhu, “The connected disk covering problem,” Journal of Combinatorial Optimization, vol. 35, no. 2, pp. 538–554, Feb. 2018. [Online]. Available: https://doi.org/10.1007/s10878-017-0195-0
- E. Horster and R. Lienhart, “Approximating Optimal Visual Sensor Placement,” in 2006 IEEE International Conference on Multimedia and Expo, Jul. 2006, pp. 1257–1260, iSSN: 1945-788X.
- V. P. Munishwar and N. B. Abu-Ghazaleh, “Coverage algorithms for visual sensor networks,” ACM Transactions on Sensor Networks, vol. 9, no. 4, pp. 45:1–45:36, Jul. 2013. [Online]. Available: https://dl.acm.org/doi/10.1145/2489253.2489262
- J. K. Han, B. S. Park, Y. S. Choi, and H. K. Park, “Genetic approach with a new representation for base station placement in mobile communications,” in IEEE 54th Vehicular Technology Conference. VTC Fall 2001. Proceedings (Cat. No.01CH37211), vol. 4, Oct. 2001, pp. 2703–2707 vol.4. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/957251
- X. Zhou, D. Wang, and P. Krähenbühl, “Objects as Points,” arXiv:1904.07850 [cs], Apr. 2019. [Online]. Available: http://arxiv.org/abs/1904.07850
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988. [Online]. Available: https://openaccess.thecvf.com/content_iccv_2017/html/Lin_Focal_Loss_for_ICCV_2017_paper.html
- M. Yi-de, L. Qing, and Q. Zhi-bai, “Automated image segmentation using improved PCNN model based on cross-entropy,” in Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004., Oct. 2004, pp. 743–746.
- F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV), Oct. 2016, pp. 565–571.
- D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Cartographica: The International Journal for Geographic Information and Geovisualization, vol. 10, no. 2, pp. 112–122, Dec. 1973, publisher: University of Toronto Press. [Online]. Available: https://www.utpjournals.press/doi/abs/10.3138/fm57-6770-u75u-7727
- S. Suzuki and K. be, “Topological structural analysis of digitized binary images by border following,” Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp. 32–46, Apr. 1985. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0734189X85900167
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 3213–3223. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2016/html/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.html
- G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan. 2019, pp. 1743–1751.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 3354–3361, iSSN: 1063-6919.
- T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft COCO: Common Objects in Context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014, pp. 740–755. [Online]. Available: http://arxiv.org/abs/1405.0312
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
- D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in 3rd International Conference on Learning Representations, {ICLR} 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. [Online]. Available: http://arxiv.org/abs/1412.6980
- J. Liang, N. Homayounfar, W.-C. Ma, Y. Xiong, R. Hu, and R. Urtasun, “PolyTransform: Deep Polygon Transformer for Instance Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9131–9140. [Online]. Available: https://openaccess.thecvf.com/content_CVPR_2020/html/Liang_PolyTransform_Deep_Polygon_Transformer_for_Instance_Segmentation_CVPR_2020_paper.html