Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What's Wrong with the Bottom-up Methods in Arbitrary-shape Scene Text Detection (2108.01809v2)

Published 4 Aug 2021 in cs.MM

Abstract: The latest trend in the bottom-up perspective for arbitrary-shape scene text detection is to reason the links between text segments using Graph Convolutional Network (GCN). Notwithstanding, the performance of the best performing bottom-up method is still inferior to that of the best performing top-down method even with the help of GCN. We argue that this is not mainly caused by the limited feature capturing ability of the text proposal backbone or GCN, but by their failure to make a full use of visual-relational features for suppressing false detection, as well as the sub-optimal route-finding mechanism used for grouping text segments. In this paper, we revitalize the classic text detection frameworks by aggregating the visual-relational features of text with two effective false positive/negative suppression mechanisms. First, dense overlapping text segments depicting the characterness' andstreamline' of text are generated for further relational reasoning and weakly supervised segment classification. Here, relational graph features are used for suppressing false positives/negatives. Then, to fuse the relational features with visual features, a Location-Aware Transfer (LAT) module is designed to transfer text's relational features into visual compatible features with a Fuse Decoding (FD) module to enhance the representation of text regions for the second step suppression. Finally, a novel multiple-text-map-aware contour-approximation strategy is developed, instead of the widely-used route-finding process. Experiments conducted on five benchmark datasets, i.e., CTW1500, Total-Text, ICDAR2015, MSRA-TD500, and MLT2017 demonstrate that our method outperforms the state-of-the-art performance when being embedded in a classic text detection framework, which revitalises the superb strength of the bottom-up methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Trans. Pattern Anal. and Mach. Intell., vol. 39, no. 11, pp. 2298–2304, 2016.
  2. A. Mishra, S. Shekhar, A. K. Singh, and A. Chakraborty, “Ocr-vqa: Visual question answering by reading text in images,” in Proc. IEEE Int. Conf. on Document Anal. and Recognit.   IEEE, 2019, pp. 947–952.
  3. F. Radenović, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” IEEE Trans. Pattern Anal. and Mach. Intell., vol. 41, no. 7, pp. 1655–1668, 2018.
  4. M. Dikmen and C. M. Burns, “Autonomous driving in the real world: Experiences with tesla autopilot and summon,” in Proc. Int. Conf. on Automot. User Interfaces and Interactive Veh. Appl., 2016, pp. 225–228.
  5. G. M. Binmakhashen and S. A. Mahmoud, “Document layout analysis: A comprehensive survey,” ACM Computing Surveys (CSUR), vol. 52, no. 6, pp. 1–36, 2019.
  6. C. Xu, W. Jia, R. Wang, X. He, B. Zhao, and Y. Zhang, “Semantic navigation of powerpoint-based lecture video for autonote generation,” IEEE Transactions on Learning Technologies, vol. 16, no. 1, pp. 1–17, 2022.
  7. Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 9365–9374.
  8. C. Ma, L. Sun, Z. Zhong, and Q. Huo, “Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks,” Pattern Recognit., vol. 111, p. 107684, 2021.
  9. H. Liu, A. Guo, D. Jiang, Y. Hu, and B. Ren, “Puzzlenet: scene text detection by segment context graph learning,” arXiv preprint arXiv:2002.11371, 2020.
  10. J. Ye, Z. Chen, J. Liu, and B. Du, “Textfusenet: Scene text detection with richer fused features.”   Proc. Int. Joint Conf. Artif. Intell., 2020.
  11. Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, “Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2020, pp. 11 753–11 762.
  12. S.-X. Zhang, X. Zhu, J.-B. Hou, C. Liu, C. Yang, H. Wang, and X.-C. Yin, “Deep relational reasoning graph network for arbitrary shape text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2020, pp. 9699–9708.
  13. F. Wang, Y. Chen, F. Wu, and X. Li, “Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection,” in Proc. ACM Int. Conf. on Multimedia, 2020, pp. 111–119.
  14. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  15. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2016, pp. 770–778.
  16. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2017, pp. 2117–2125.
  17. C. Zhang, B. Liang, Z. Huang, M. En, J. Han, E. Ding, and X. Ding, “Look more than once: An accurate detector for text of arbitrary shapes,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 10 552–10 561.
  18. Y. Liu, L. Jin, S. Zhang, C. Luo, and S. Zhang, “Curved scene text detection via transverse and longitudinal sequence connection,” Pattern Recognit., vol. 90, pp. 337–345, 2019.
  19. C. K. Ch’ng and C. S. Chan, “Total-text: A comprehensive dataset for scene text detection and recognition,” in Proc. IEEE Int. Conf. on Document Anal. and Recognit., vol. 1, 2017, pp. 935–942.
  20. Y. Zhu, J. Chen, L. Liang, Z. Kuang, L. Jin, and W. Zhang, “Fourier contour embedding for arbitrary-shaped text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2021, pp. 3123–3131.
  21. P. Dai, S. Zhang, H. Zhang, and X. Cao, “Progressive contour regression for arbitrary-shape scene text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2021, pp. 7393–7402.
  22. Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. Eur. Conf. Comput. Vision.   Springer, 2016, pp. 56–72.
  23. S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. Eur. Conf. Comput. Vision, 2018, pp. 20–36.
  24. Z. Tian, M. Shu, P. Lyu, R. Li, C. Zhou, X. Shen, and J. Jia, “Learning shape-aware embedding for scene text detection,” in Proc. IEEE Int. Conf. Comput. Vision, 2019, pp. 4234–4243.
  25. B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2017, pp. 2550–2558.
  26. C. Xu, R. Wang, S. Lin, X. Luo, B. Zhao, L. Shao, and M. Hu, “Lecture2note: Automatic generation of lecture notes from slide-based educational videos,” in 2019 IEEE International Conference on Multimedia and Expo (ICME).   IEEE, 2019, pp. 898–903.
  27. C. Xu, W. Jia, R. Wang, X. Luo, and X. He, “Morphtext: Deep morphology regularized accurate arbitrary-shape scene text detection,” IEEE Trans. Multimedia, 2022.
  28. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  29. P. Dai, H. Zhang, and X. Cao, “Deep multi-scale context aware feature aggregation for curved scene text detection,” IEEE Trans. Multimedia, vol. 22, no. 8, pp. 1969–1984, 2019.
  30. E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, and G. Li, “Scene text detection with supervised pyramid context network,” in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, 2019, pp. 9038–9045.
  31. W. Feng, W. He, F. Yin, X.-Y. Zhang, and C.-L. Liu, “Textdragon: An end-to-end framework for arbitrary shaped text spotting,” in Proc. IEEE Int. Conf. Comput. Vision, 2019, pp. 9076–9085.
  32. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2017, pp. 2961–2969.
  33. W. Zhang, Y. Qiu, M. Liao, R. Zhang, X. Wei, and X. Bai, “Scene text detection with scribble lines,” arXiv preprint arXiv:2012.05030, 2020.
  34. S. Xiao, L. Peng, R. Yan, K. An, G. Yao, and J. Min, “Sequential deformation for accurate scene text detection,” in Proc. Eur. Conf. Comput. Vision, pp. 108–124.
  35. M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 07, 2020, pp. 11 474–11 481.
  36. S. Zhang, Y. Liu, L. Jin, Z. Wei, and C. Shen, “Opmp: An omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection,” IEEE Trans. Multimedia, vol. 23, pp. 454–467, 2020.
  37. P. Dai, Y. Li, H. Zhang, J. Li, and X. Cao, “Accurate scene text detection via scale-aware data augmentation and shape similarity constraint,” IEEE Trans. Multimedia, 2021.
  38. X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2018, pp. 7794–7803.
  39. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 764–773.
  40. A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2016, pp. 2315–2324.
  41. J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Trans. Multimedia, vol. 20, no. 11, pp. 3111–3122, 2018.
  42. Z. Wang, L. Zheng, Y. Li, and S. Wang, “Linkage based face clustering via graph convolution network,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 1117–1125.
  43. A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2016, pp. 761–769.
  44. D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu et al., “Icdar 2015 competition on robust reading,” in Proc. IEEE Int. Conf. on Document Anal. and Recognit., 2015, pp. 1156–1160.
  45. C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations in natural images,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2012, pp. 1083–1090.
  46. C. Yao, X. Bai, and W. Liu, “A unified framework for multioriented text detection and recognition,” IEEE Trans. Image Process., vol. 23, no. 11, pp. 4737–4749, 2014.
  47. N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U. Pal, C. Rigaud, J. Chazalon et al., “Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt,” in Proc. IEEE Int. Conf. on Document Anal. and Recognit., vol. 1, 2017, pp. 1454–1459.
  48. W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 9336–9345.
  49. W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in Proc. IEEE Int. Conf. Comput. Vision, 2019, pp. 8440–8449.
  50. F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2018, pp. 2403–2412.
  51. F. Wang, L. Zhao, X. Li, X. Wang, and D. Tao, “Geometry-aware scene text detection with instance transformation network,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2018, pp. 1381–1389.
  52. Y. Wang, H. Xie, Z. Zha, Y. Tian, Z. Fu, and Y. Zhang, “R-net: A relationship network for efficient and accurate scene text detection,” IEEE Trans. Multimedia, vol. 23, pp. 1316–1329, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chengpei Xu (12 papers)
  2. Wenjing Jia (24 papers)
  3. Tingcheng Cui (1 paper)
  4. Ruomei Wang (6 papers)
  5. Xiangjian He (34 papers)
  6. Yuan-Fang Zhang (2 papers)
Citations (1)