Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Recurrent Generic Contour-based Instance Segmentation with Progressive Learning (2301.08898v3)

Published 21 Jan 2023 in cs.CV

Abstract: Contour-based instance segmentation has been actively studied, thanks to its flexibility and elegance in processing visual objects within complex backgrounds. In this work, we propose a novel deep network architecture, i.e., PolySnake, for generic contour-based instance segmentation. Motivated by the classic Snake algorithm, the proposed PolySnake achieves superior and robust segmentation performance with an iterative and progressive contour refinement strategy. Technically, PolySnake introduces a recurrent update operator to estimate the object contour iteratively. It maintains a single estimate of the contour that is progressively deformed toward the object boundary. At each iteration, PolySnake builds a semantic-rich representation for the current contour and feeds it to the recurrent operator for further contour adjustment. Through the iterative refinements, the contour progressively converges to a stable status that tightly encloses the object instance. Beyond the scope of general instance segmentation, extensive experiments are conducted to validate the effectiveness and generalizability of our PolySnake in two additional specific task scenarios, including scene text detection and lane detection. The results demonstrate that the proposed PolySnake outperforms the existing advanced methods on several multiple prevalent benchmarks across the three tasks. The codes and pre-trained models are available at https://github.com/fh2019ustc/PolySnake

Definition Search Book Streamline Icon: https://streamlinehq.com
References (97)
  1. W.-C. Ma, S. Wang, R. Hu, Y. Xiong, and R. Urtasun, “Deep rigid instance scene flow,” in CVPR, 2019, pp. 3614–3622.
  2. S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, “A survey of deep learning techniques for autonomous driving,” Journal of Field Robotics, vol. 37, no. 3, pp. 362–386, 2020.
  3. Y.-L. Jin, Z.-Y. Ji, D. Zeng, and X.-P. Zhang, “VWP: An efficient drl-based autonomous driving model,” TMM, 2022.
  4. H. A. Alhaija, S. K. Mustikovela, L. Mescheder, A. Geiger, and C. Rother, “Augmented reality meets deep learning for car instance segmentation in urban scenes,” in BMVC, vol. 1, 2017, p. 2.
  5. H. Abu Alhaija, S. K. Mustikovela, L. Mescheder, A. Geiger, and C. Rother, “Augmented reality meets computer vision: Efficient data generation for urban driving scenes,” IJCV, vol. 126, no. 9, pp. 961–972, 2018.
  6. N. Fazeli, M. Oller, J. Wu, Z. Wu, J. B. Tenenbaum, and A. Rodriguez, “See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion,” Science Robotics, vol. 4, no. 26, 2019.
  7. K. Kleeberger, R. Bormann, W. Kraus, and M. F. Huber, “A survey on learning-based robotic grasping,” Current Robotics Reports, vol. 1, no. 4, pp. 239–249, 2020.
  8. D. Li, Y. Li, Q. Xie, Y. Wu, Z. Yu, and J. Wang, “Tiny defect detection in high-resolution aero-engine blade images via a coarse-to-fine framework,” TIM, vol. 70, pp. 1–12, 2021.
  9. Q. Lin, J. Zhou, Q. Ma, Y. Ma, L. Kang, and J. Wang, “EMRA-Net: A pixel-wise network fusing local and global features for tiny and low-contrast surface defect detection,” TIM, vol. 71, pp. 1–14, 2022.
  10. C. Huang, Q. Xu, Y. Wang, Y. Wang, and Y. Zhang, “Self-supervised masking for unsupervised anomaly detection and localization,” TMM, 2022.
  11. B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” TPAMI, vol. 39, no. 11, pp. 2298–2304, 2016.
  12. C. Yao, X. Bai, and W. Liu, “A unified framework for multioriented text detection and recognition,” TIP, vol. 23, no. 11, pp. 4737–4749, 2014.
  13. Y. Wang, H. Xie, Z. Zha, Y. Tian, Z. Fu, and Y. Zhang, “R-Net: A relationship network for efficient and accurate scene text detection,” TMM, vol. 23, pp. 1316–1329, 2020.
  14. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in ICCV, 2017, pp. 2961–2969.
  15. K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang et al., “Hybrid task cascade for instance segmentation,” in CVPR, 2019, pp. 4974–4983.
  16. Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask Scoring R-CNN,” in CVPR, 2019.
  17. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in CVPR, 2018, pp. 8759–8768.
  18. X. Shen, J. Yang, C. Wei, B. Deng, J. Huang, X.-S. Hua, X. Cheng, and K. Liang, “DCT-Mask: Discrete cosine transform mask representation for instance segmentation,” in CVPR, 2021, pp. 8720–8729.
  19. G. Zhang, X. Lu, J. Tan, J. Li, Z. Zhang, Q. Li, and X. Hu, “RefineMask: Towards high-quality instance segmentation with fine-grained features,” in CVPR, 2021, pp. 6861–6869.
  20. M. Hu, Y. Li, L. Fang, and S. Wang, “A2-FPN: Attention aggregation based feature pyramid network for instance segmentation,” in CVPR, 2021, pp. 15 343–15 352.
  21. C. Tang, H. Chen, X. Li, J. Li, Z. Zhang, and X. Hu, “Look closer to segment better: Boundary patch refinement for instance segmentation,” in CVPR, 2021, pp. 13 926–13 935.
  22. S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning: A survey,” TPAMI, 2021.
  23. E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” TPAMI, vol. 39, no. 4, pp. 640–651, 2017.
  24. E. Xie, P. Sun, X. Song, W. Wang, X. Liu, D. Liang, C. Shen, and P. Luo, “PolarMask: Single shot instance segmentation with polar representation,” in CVPR, 2020.
  25. E. Xie, W. Wang, M. Ding, R. Zhang, and P. Luo, “PolarMask++: Enhanced polar representation for single-shot instance segmentation and beyond,” TPAMI, 2021.
  26. S. Peng, W. Jiang, H. Pi, X. Li, H. Bao, and X. Zhou, “Deep snake for real-time instance segmentation,” in CVPR, 2020, pp. 8533–8542.
  27. M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” IJCV, vol. 1, no. 4, pp. 321–331, 1988.
  28. S. Jetley, M. Sapienza, S. Golodetz, and P. H. Torr, “Straight to shapes: Real-time detection of encoded shapes,” in CVPR, 2017, pp. 6550–6559.
  29. W. Xu, H. Wang, F. Qi, and C. Lu, “Explicit shape encoding for real-time instance segmentation,” in ICCV, 2019, pp. 5168–5177.
  30. J. Liang, N. Homayounfar, W.-C. Ma, Y. Xiong, R. Hu, and R. Urtasun, “PolyTransform: Deep polygon transformer for instance segmentation,” in CVPR, 2020, pp. 9131–9140.
  31. H. Ling, J. Gao, A. Kar, W. Chen, and S. Fidler, “Fast interactive object annotation with curve-GCN,” in CVPR, 2019.
  32. Z. Liu, J. H. Liew, X. Chen, and J. Feng, “Dance: A deep attentive contour model for efficient instance segmentation,” in WACV, 2021, pp. 345–354.
  33. T. Zhang, S. Wei, and S. Ji, “E2EC: An end-to-end contour-based method for high-quality high-speed instance segmentation,” in CVPR, 2022, pp. 4443–4452.
  34. B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in ICCV, 2011, pp. 991–998.
  35. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016, pp. 3213–3223.
  36. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in ECCV, 2014, pp. 740–755.
  37. L. Qi, L. Jiang, S. Liu, X. Shen, and J. Jia, “Amodal instance segmentation with kins dataset,” in CVPR, 2019, pp. 3014–3023.
  38. P. Dai, S. Zhang, H. Zhang, and X. Cao, “Progressive contour regression for arbitrary-shape scene text detection,” in CVPR, 2021, pp. 7393–7402.
  39. Z. Feng, S. Guo, X. Tan, K. Xu, M. Wang, and L. Ma, “Rethinking efficient lane detection via curve modeling,” in CVPR, 2022, pp. 17 062–17 070.
  40. H. Ding, S. Qiao, A. Yuille, and W. Shen, “Deeply shape-guided cascade for instance segmentation,” in CVPR, 2021, pp. 8278–8288.
  41. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” vol. 28, 2015.
  42. D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time instance segmentation,” in ICCV, 2019, pp. 9157–9166.
  43. H. Chen, K. Sun, Z. Tian, C. Shen, Y. Huang, and Y. Yan, “BlendMask: Top-down meets bottom-up for instance segmentation,” in CVPR, 2020, pp. 8573–8581.
  44. Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one-stage object detection,” in ICCV, 2019, pp. 9627–9636.
  45. X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “SOLOv2: Dynamic and fast instance segmentation,” arXiv preprint arXiv:2003.10152, 2020.
  46. Z. Tian, C. Shen, and H. Chen, “Conditional convolutions for instance segmentation,” in ECCV, 2020, pp. 282–298.
  47. W. Zhang, J. Pang, K. Chen, and C. C. Loy, “K-Net: Towards unified image segmentation,” arXiv preprint arXiv:2106.14855, 2021.
  48. R. Guo, D. Niu, L. Qu, and Z. Li, “SOTR: Segmenting objects with transformers,” in ICCV, 2021, pp. 7157–7166.
  49. J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
  50. T. Tang, W. Xu, R. Ye, Y.-F. Wang, and C. Lu, “ContourRender: Detecting arbitrary contour shape for instance segmentation in one pass,” arXiv preprint arXiv:2106.03382, 2021.
  51. W. Park, D. Jin, and C.-S. Kim, “EigenContours: Novel contour descriptors based on low-rank approximation,” in CVPR, 2022, pp. 2667–2675.
  52. X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” arXiv preprint arXiv:1904.07850, 2019.
  53. F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in CVPR, 2018, pp. 2403–2412.
  54. M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer networks,” vol. 28, 2015.
  55. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” arXiv preprint arXiv:1409.1259, 2014.
  56. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017, pp. 2117–2125.
  57. M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” IJCV, vol. 88, no. 2, pp. 303–338, 2010.
  58. B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in ECCV, 2014, pp. 297–312.
  59. K. Li and J. Malik, “Amodal instance segmentation,” in ECCV, 2016, pp. 677–693.
  60. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
  61. Y. Liu, L. Jin, S. Zhang, C. Luo, and S. Zhang, “Curved scene text detection via transverse and longitudinal sequence connection,” PR, vol. 90, pp. 337–345, 2019.
  62. X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial As Deep: Spatial cnn for traffic scene understanding,” in AAAI, vol. 32, no. 1, 2018.
  63. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  64. J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in CVPR, 2016, pp. 3150–3158.
  65. Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, “Fully convolutional instance-aware semantic segmentation,” in CVPR, 2017, pp. 2359–2367.
  66. D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT++: Better real-time instance segmentation,” TPAMI, 2020.
  67. Y. Lee and J. Park, “CenterMask: Real-time anchor-free instance segmentation,” in CVPR, 2020, pp. 13 906–13 915.
  68. L. Qi, L. Jiang, S. Liu, X. Shen, and J. Jia, “Amodal instance segmentation with kins dataset,” in CVPR, 2019.
  69. P. Follmann, R. König, P. Härtinger, M. Klostermann, and T. Böttger, “Learning to see the invisible: End-to-end trainable amodal instance segmentation,” in WACV, 2019, pp. 1328–1336.
  70. Y. Xiao, Y. Xu, Z. Zhong, W. Luo, J. Li, and S. Gao, “Amodal segmentation based on visible region segmentation and shape prior,” in AAAI, vol. 35, no. 4, 2021, pp. 2995–3003.
  71. X. Zeng, X. Liu, and J. Yin, “Amodal segmentation just like doing a jigsaw,” Applied Sciences, vol. 12, no. 8, p. 4061, 2022.
  72. M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in AAAI, vol. 34, no. 07, 2020, pp. 11 474–11 481.
  73. C. Zhang, B. Liang, Z. Huang, M. En, J. Han, E. Ding, and X. Ding, “Look more than once: An accurate detector for text of arbitrary shapes,” in CVPR, 2019, pp. 10 552–10 561.
  74. W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in CVPR, 2019, pp. 9336–9345.
  75. Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in CVPR, 2019, pp. 9365–9374.
  76. W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in ICCV, 2019, pp. 8440–8449.
  77. W. Feng, W. He, F. Yin, X.-Y. Zhang, and C.-L. Liu, “TextDragon: An end-to-end framework for arbitrary shaped text spotting,” in ICCV, 2019, pp. 9076–9085.
  78. S.-X. Zhang, X. Zhu, J.-B. Hou, C. Liu, C. Yang, H. Wang, and X.-C. Yin, “Deep relational reasoning graph network for arbitrary shape text detection,” in CVPR, 2020, pp. 9699–9708.
  79. Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, “ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in CVPR, 2020, pp. 11 753–11 762.
  80. S.-X. Zhang, X. Zhu, C. Yang, H. Wang, and X.-C. Yin, “Adaptive boundary proposal network for arbitrary shape text detection,” in ICCV, 2021, pp. 1305–1314.
  81. W. Wang, E. Xie, X. Li, X. Liu, D. Liang, Z. Yang, T. Lu, and C. Shen, “PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text,” TPAMI, vol. 44, no. 9, pp. 5349–5367, 2021.
  82. S. Liu, J. Jia, S. Fidler, and R. Urtasun, “SGN: Sequential grouping networks for instance segmentation,” in ICCV, 2017, pp. 3496–3504.
  83. Y. Liu, S. Yang, B. Li, W. Zhou, J. Xu, H. Li, and Y. Lu, “Affinity derivation and graph merge for instance segmentation,” in ECCV, 2018, pp. 686–703.
  84. D. Neven, B. D. Brabandere, M. Proesmans, and L. V. Gool, “Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth,” in CVPR, 2019, pp. 8837–8845.
  85. Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, and R. Urtasun, “UPSNet: A unified panoptic segmentation network,” in CVPR, 2019, pp. 8818–8826.
  86. N. Gao, Y. Shan, Y. Wang, X. Zhao, Y. Yu, M. Yang, and K. Huang, “SSAP: Single-shot instance segmentation with affinity pyramid,” in ICCV, 2019, pp. 642–651.
  87. D. Acuna, H. Ling, A. Kar, and S. Fidler, “Efficient interactive annotation of segmentation datasets with Polygon-RNN++,” in CVPR, 2018, pp. 859–868.
  88. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” 2017.
  89. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  90. H. Xu, S. Wang, X. Cai, W. Zhang, X. Liang, and Z. Li, “CurveLane-NAS: Unifying lane-sensitive architecture search and adaptive point blending,” in ECCV, 2020, pp. 689–704.
  91. Y. Ko, Y. Lee, S. Azam, F. Munir, M. Jeon, and W. Pedrycz, “Key points estimation and point instance segmentation approach for lane detection,” TITS, vol. 23, no. 7, pp. 8949–8958, 2021.
  92. T. Zheng, H. Fang, Y. Zhang, W. Tang, Z. Yang, H. Liu, and D. Cai, “RESA: Recurrent feature-shift aggregator for lane detection,” in AAAI, vol. 35, no. 4, 2021, pp. 3547–3554.
  93. L. Tabelini, R. Berriel, T. M. Paixao, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “Keep your eyes on the lane: Real-time attention-guided lane detection,” in CVPR, 2021, pp. 294–302.
  94. H. Abualsaud, S. Liu, D. B. Lu, K. Situ, A. Rangesh, and M. M. Trivedi, “LaneAF: Robust multi-lane detection with affinity fields,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7477–7484, 2021.
  95. Z. Qin, H. Wang, and X. Li, “Ultra fast structure-aware deep lane detection,” in ECCV, 2020, pp. 276–291.
  96. Z. Qin, P. Zhang, and X. Li, “Ultra fast deep lane detection with hybrid anchor driven ordinal classification,” TPAMI, 2022.
  97. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
Citations (7)

Summary

We haven't generated a summary for this paper yet.