Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection (2404.17151v1)

Published 26 Apr 2024 in cs.MM and cs.CV

Abstract: Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a novel approach, named ``MorphText", to capture the regularity of texts by embedding deep morphology for arbitrary-shape text detection. Towards this end, two deep morphological modules are designed to regularize text segments and determine the linkage between them. First, a Deep Morphological Opening (DMOP) module is constructed to remove false text segment detections generated in the feature extraction process. Then, a Deep Morphological Closing (DMCL) module is proposed to allow text instances of various shapes to stretch their morphology along their most significant orientation while deriving their connections. Extensive experiments conducted on four challenging benchmark datasets (CTW1500, Total-Text, MSRA-TD500 and ICDAR2017) demonstrate that our proposed MorphText outperforms both top-down and bottom-up state-of-the-art arbitrary-shape scene text detection approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. C. Ma, L. Sun, Z. Zhong, and Q. Huo, “Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks,” Pattern Recognit., vol. 111, p. 107684, 2021.
  2. C. Zhang, B. Liang, Z. Huang, M. En, J. Han, E. Ding, and X. Ding, “Look more than once: An accurate detector for text of arbitrary shapes,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 10 552–10 561.
  3. X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2018, pp. 7794–7803.
  4. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 764–773.
  5. M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 07, 2020, pp. 11 474–11 481.
  6. Y. Zhu, J. Chen, L. Liang, Z. Kuang, L. Jin, and W. Zhang, “Fourier contour embedding for arbitrary-shaped text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2021, pp. 3123–3131.
  7. S.-X. Zhang, X. Zhu, J.-B. Hou, C. Liu, C. Yang, H. Wang, and X.-C. Yin, “Deep relational reasoning graph network for arbitrary shape text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2020, pp. 9699–9708.
  8. J. Ye, Z. Chen, J. Liu, and B. Du, “Textfusenet: Scene text detection with richer fused features.”   Proc. Int. Joint Conf. Artif. Intell., 2020.
  9. H. Liu, A. Guo, D. Jiang, Y. Hu, and B. Ren, “Puzzlenet: scene text detection by segment context graph learning,” arXiv preprint arXiv:2002.11371, 2020.
  10. Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, “Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2020, pp. 11 753–11 762.
  11. S. Xiao, L. Peng, R. Yan, K. An, G. Yao, and J. Min, “Sequential deformation for accurate scene text detection,” in Proc. Eur. Conf. Comput. Vision, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds., 2020.
  12. E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, and G. Li, “Scene text detection with supervised pyramid context network,” in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, 2019, pp. 9038–9045.
  13. Z. Liu, G. Lin, S. Yang, F. Liu, W. Lin, and W. L. Goh, “Towards robust curve text detection with conditional spatial expansion,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 7269–7278.
  14. C. Xu, W. Jia, R. Wang, X. He, B. Zhao, and Y. Zhang, “Semantic navigation of powerpoint-based lecture video for autonote generation,” IEEE Transactions on Learning Technologies, vol. 16, no. 1, pp. 1–17, 2022.
  15. B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2017, pp. 2550–2558.
  16. Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, “Character region awareness for text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 9365–9374.
  17. F. Wang, Y. Chen, F. Wu, and X. Li, “Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection,” in Proc ACM Int. Conf. Multimedia, 2020, pp. 111–119.
  18. S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. Eur. Conf. Comput. Vision, 2018, pp. 20–36.
  19. C. Xu, H. Fu, L. Ma, W. Jia, C. Zhang, F. Xia, X. Ai, B. Li, and W. Zhang, “Seeing text in the dark: Algorithm and benchmark,” arXiv preprint arXiv:2404.08965, 2024.
  20. Y.-f. Zhang, J. Zheng, L. Li, N. Liu, W. Jia, X. Fan, C. Xu, and X. He, “Rethinking feature aggregation for deep rgb-d salient object detection,” Neurocomputing, vol. 423, pp. 463–473, 2021.
  21. C. Xu, W. Jia, T. Cui, R. Wang, Y.-f. Zhang, and X. He, “What’s wrong with the bottom-up methods in arbitrary-shape scene text detection,” arXiv preprint arXiv:2108.01809, 2021.
  22. Z. Tian, M. Shu, P. Lyu, R. Li, C. Zhou, X. Shen, and J. Jia, “Learning shape-aware embedding for scene text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 4234–4243.
  23. Z. Gu, X. Yang, W. Jia, C. Xu, P. Yu, X. He, H. Chen, and Y. Lin, “Strokepeo: Construction of a clinical ontology for physical examination of stroke,” in 2022 9th International Conference on Digital Home (ICDH).   IEEE, 2022, pp. 218–223.
  24. P. Dai, S. Zhang, H. Zhang, and X. Cao, “Progressive contour regression for arbitrary-shape scene text detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2021, pp. 7393–7402.
  25. K. Nogueira et al., “An introduction to deep morphological networks,” arXiv e-prints, pp. arXiv–1906, 2019.
  26. R. Mondal, S. S. Mukherjee, S. Santra, and B. Chanda, “Morphological network: How far can we go with morphological neurons?” arXiv preprint arXiv:1901.00109, 2019.
  27. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2016, pp. 770–778.
  28. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 2961–2969.
  29. X. Wang, Y. Jiang, Z. Luo, C.-L. Liu, H. Choi, and S. Kim, “Arbitrary shape scene text detection with adaptive text region representation,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 6449–6458.
  30. L. Qiao, S. Tang, Z. Cheng, Y. Xu, Y. Niu, S. Pu, and F. Wu, “Text perceptron: Towards end-to-end arbitrary-shaped text spotting,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 07, 2020, pp. 11 899–11 907.
  31. W. Zhang, Y. Qiu, M. Liao, R. Zhang, X. Wei, and X. Bai, “Scene text detection with scribble line,” in Proc. IEEE Int. Conf. on Document Anal. and Recognit.   Springer, 2021, pp. 79–94.
  32. C. Xu, W. Jia, T. Cui, R. Wang, Y.-f. Zhang, and X. He, “Arbitrary-shape scene text detection via visual-relational rectification and contour approximation,” IEEE Trans. Multimedia, 2022.
  33. C. Xu, R. Wang, S. Lin, X. Luo, B. Zhao, L. Shao, and M. Hu, “Lecture2note: Automatic generation of lecture notes from slide-based educational videos,” in 2019 IEEE International Conference on Multimedia and Expo (ICME).   IEEE, 2019, pp. 898–903.
  34. W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2019, pp. 9336–9345.
  35. Y. Xu, Y. Wang, W. Zhou, Y. Wang, Z. Yang, and X. Bai, “Textfield: Learning a deep direction field for irregular scene text detection,” IEEE Trans. Image Processing, vol. 28, no. 11, pp. 5566–5579, 2019.
  36. J. Serra, “Image analysis and mathematical morphology,” 1982.
  37. E. Zamora and H. Sossa, “Dendrite morphological neurons trained by stochastic gradient descent,” Neurocomputing, vol. 260, pp. 420–431, 2017.
  38. R. Mondal, P. Purkait, S. Santra, and B. Chanda, “Morphological networks for image de-raining,” in International Conference on Discrete Geometry for Computer Imagery.   Springer, 2019, pp. 262–275.
  39. G. Franchi, A. Fehri, and A. Yao, “Deep morphological networks,” Pattern Recognit., vol. 102, p. 107246, 2020.
  40. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2017, pp. 2117–2125.
  41. R. Mondal, M. S. Dey, and B. Chanda, “Image restoration by learning morphological opening-closing network,” Mathematical Morphology-Theory and Applications, vol. 4, no. 1, pp. 87–107, 2020.
  42. B. R. Vatti, “A generic solution to polygon clipping,” Communications of the ACM, vol. 35, no. 7, pp. 56–63, 1992.
  43. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  44. A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2016, pp. 761–769.
  45. A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2016, pp. 2315–2324.
  46. W. Feng, W. He, F. Yin, X.-Y. Zhang, and C.-L. Liu, “Textdragon: An end-to-end framework for arbitrary shaped text spotting,” in Proc. IEEE Int. Conf. Comput. Vision, 2019, pp. 9076–9085.
  47. Y. Liu, L. Jin, S. Zhang, C. Luo, and S. Zhang, “Curved scene text detection via transverse and longitudinal sequence connection,” Pattern Recognit., vol. 90, pp. 337–345, 2019.
  48. C. K. Ch’ng and C. S. Chan, “Total-text: A comprehensive dataset for scene text detection and recognition,” in Proc. IEEE Int. Conf. on Document Anal. and Recognit., vol. 1.   IEEE, 2017, pp. 935–942.
  49. C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations in natural images,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.   IEEE, 2012, pp. 1083–1090.
  50. N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U. Pal, C. Rigaud, J. Chazalon et al., “Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt,” in Proc. IEEE Int. Conf. on Document Anal. and Recognit., vol. 1.   IEEE, 2017, pp. 1454–1459.
  51. S. Zhang, Y. Liu, L. Jin, Z. Wei, and C. Shen, “Opmp: An omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection,” IEEE Trans. Multimedia, vol. 23, pp. 454–467, 2020.
  52. P. Dai, Y. Li, H. Zhang, J. Li, and X. Cao, “Accurate scene text detection via scale-aware data augmentation and shape similarity constraint,” IEEE Trans. Multimedia, 2021.
  53. P. Dai, H. Zhang, and X. Cao, “Deep multi-scale context aware feature aggregation for curved scene text detection,” IEEE Trans. Multimedia, vol. 22, no. 8, pp. 1969–1984, 2019.
  54. F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 2018, pp. 2403–2412.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chengpei Xu (12 papers)
  2. Wenjing Jia (24 papers)
  3. Ruomei Wang (6 papers)
  4. Xiaonan Luo (21 papers)
  5. Xiangjian He (34 papers)