Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling (2401.03637v1)
Abstract: Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM). To optimize and train REM, we propose a joint reading-order estimation loss consisting of a classification loss, an orthogonality loss, and a distribution loss. With the help of IBM, we can divide the initial text boundary into two symmetric control points and iteratively refine the new text boundary using a lightweight boundary refinement module (BRM) for adapting to various shapes and scales. To alleviate the incompatibility between text detection and recognition, we propose a dynamic sampling module (DSM) with a thin-plate spline that can dynamically sample appropriate features for recognition in the detected text region. Without extra supervision, the DSM can proactively learn to sample appropriate features for text recognition through the gradient returned by the recognition module. Extensive experiments on both challenging scene text and inverse-like scene text datasets demonstrate that our method achieves superior performance both on irregular and inverse-like text spotting.
- W. Wang, E. Xie, X. Li, X. Liu, D. Liang, Z. Yang, T. Lu, and C. Shen, “PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5349–5367, 2022.
- W. Feng, W. He, F. Yin, X.-Y. Zhang, and C.-L. Liu, “Textdragon: An end-to-end framework for arbitrary shaped text spotting,” in ICCV, 2019, pp. 9075–9084.
- H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, and W. Liu, “All you need is boundary: Toward arbitrary-shaped text spotting,” in AAAI, 2020, pp. 12 160–12 167.
- L. Qiao, Y. Chen, Z. Cheng, Y. Xu, Y. Niu, S. Pu, and F. Wu, “MANGO: A mask attention guided one-stage scene text spotter,” in AAAI, 2021, pp. 2467–2476.
- R. Ronen, S. Tsiper, O. Anschel, I. Lavi, A. Markovitz, and R. Manmatha, “GLASS: global to local attention for scene-text spotting,” in ECCV, S. Avidan, G. J. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds., vol. 13688, 2022, pp. 249–266.
- M. Liao, G. Pang, J. Huang, T. Hassner, and X. Bai, “Mask textspotter v3: Segmentation proposal network for robust scene text spotting,” in ECCV, vol. 12356, 2020, pp. 706–722.
- S. Qin, A. Bissacco, M. Raptis, Y. Fujii, and Y. Xiao, “Towards unconstrained end-to-end text spotting,” in ICCV, 2019, pp. 4703–4713.
- L. Qiao, S. Tang, Z. Cheng, Y. Xu, Y. Niu, S. Pu, and F. Wu, “Text perceptron: Towards end-to-end arbitrary-shaped text spotting,” in AAAI, 2020, pp. 11 899–11 907.
- W. Wang, Y. Zhou, J. Lv, D. Wu, G. Zhao, N. Jiang, and W. Wang, “Tpsnet: Reverse thinking of thin plate splines for arbitrary shape scene text representation,” in ACM MM, 2022, pp. 5014–5025.
- P. Wang, C. Zhang, F. Qi, S. Liu, X. Zhang, P. Lyu, J. Han, J. Liu, E. Ding, and G. Shi, “Pgnet: Real-time arbitrarily-shaped text spotting with point gathering network,” in AAAI, 2021, pp. 2782–2790.
- S. Kim, S. Shin, Y. Kim, H. Cho, T. Kil, J. Surh, S. Park, B. Lee, and Y. Baek, “DEER: detection-agnostic end-to-end recognizer for scene text spotting,” CoRR, vol. abs/2203.05122, pp. 1–10, 2022.
- J. Wu, P. Lyu, G. Lu, C. Zhang, K. Yao, and W. Pei, “Decoupling recognition from detection: Single shot self-reliant scene text spotter,” in ACM MM, 2022, pp. 1319–1328.
- J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Trans. Multimedia, vol. 20, no. 11, pp. 3111–3122, 2018.
- J. Hou, X. Zhu, C. Liu, K. Sheng, L. Wu, H. Wang, and X. Yin, “HAM: hidden anchor mechanism for scene text detection,” IEEE Trans. Image Process., vol. 29, pp. 7904–7916, 2020.
- S. Zhang, X. Zhu, J. Hou, and X. Yin, “Graph fusion network for multi-oriented object detection,” Appl. Intell., vol. 53, no. 2, pp. 2280–2294, 2023.
- M. He, M. Liao, Z. Yang, H. Zhong, J. Tang, W. Cheng, C. Yao, Y. Wang, and X. Bai, “MOST: A multi-oriented scene text detector with localization refinement,” in CVPR, 2021, pp. 8813–8822.
- S.-X. Zhang, X. Zhu, L. Chen, J. Hou, and X. Yin, “Arbitrary shape text detection via segmentation with probability maps,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 2736–2750, 2023.
- H. Bi, C. Xu, C. Shi, G. Liu, H. Zhang, Y. Li, and J. Dong, “Hgr-net: Hierarchical graph reasoning network for arbitrary shape scene text detection,” IEEE Trans. Image Process., vol. 32, pp. 4142–4155, 2023.
- C. Yang, M. Chen, Z. Xiong, Y. Yuan, and Q. Wang, “Cm-net: Concentric mask based arbitrary-shaped text detection,” IEEE Trans. Image Process., vol. 31, pp. 2864–2877, 2022.
- J. Hou, X. Zhu, C. Liu, C. Yang, L. Wu, H. Wang, and X. Yin, “Detecting text in scene and traffic guide panels with attention anchor mechanism,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 11, pp. 6890–6899, 2021.
- S.-X. Zhang, X. Zhu, J.-B. Hou, C. Liu, C. Yang, H. Wang, and X.-C. Yin, “Deep relational reasoning graph network for arbitrary shape text detection,” in CVPR, 2020, pp. 9699–9708.
- J. Tang, Z. Yang, Y. Wang, Q. Zheng, Y. Xu, and X. Bai, “Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping,” Pattern Recognition, vol. 96, 2019.
- S.-X. Zhang, X. Zhu, J.-B. Hou, C. Yang, and X.-C. Yin, “Kernel proposal network for arbitrary shape text detection,” IEEE Trans. Neural Networks Learn. Syst., vol. 45, no. 3, pp. 2736–2750, 2023.
- Y. Zhu, J. Chen, L. Liang, Z. Kuang, L. Jin, and W. Zhang, “Fourier contour embedding for arbitrary-shaped text detection,” in CVPR, 2021, pp. 3123–3131.
- S.-X. Zhang, X. Zhu, C. Yang, H. Wang, and X.-C. Yin, “Adaptive boundary proposal network for arbitrary shape text detection,” in ICCV, 2021, pp. 1305–1314.
- F. Wang, Y. Chen, F. Wu, and X. Li, “Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection,” in ACM-MM, 2020, pp. 111–119.
- M. Ye, J. Zhang, S. Zhao, J. Liu, B. Du, and D. Tao, “Dptext-detr: Towards better scene text detection with dynamic points in transformer,” in AAAI. AAAI, 2023, pp. 3241–3249.
- S.-X. Zhang, C. Yang, X. Zhu, and X.-C. Yin, “Arbitrary shape text detection via boundary transformer,” IEEE Transactions on Multimedia, pp. 1–14, 2023.
- A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, “Photoocr: Reading text in uncontrolled conditions,” in ICCV. IEEE Computer Society, 2013, pp. 785–792.
- K. Wang, B. Babenko, and S. J. Belongie, “End-to-end scene text recognition,” in ICCV, D. N. Metaxas, L. Quan, A. Sanfeliu, and L. V. Gool, Eds., 2011, pp. 1457–1464.
- B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2298–2304, 2017.
- P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang, “Reading scene text in deep convolutional sequences,” in AAAI, 2016, pp. 3501–3508.
- B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, “Robust scene text recognition with automatic rectification,” in CVPR, 2016, pp. 4168–4176.
- R. Litman, O. Anschel, S. Tsiper, R. Litman, S. Mazor, and R. Manmatha, “SCATTER: selective context attentional scene text recognizer,” in CVPR, 2020, pp. 11 959–11 969.
- L. Xing, Z. Tian, W. Huang, and M. R. Scott, “Convolutional character networks,” in ICCV, 2019, pp. 9125–9135.
- Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, and S. Zhou, “AON: towards arbitrarily-oriented text recognition,” in CVPR, 2018, pp. 5571–5579.
- H. Li, P. Wang, C. Shen, and G. Zhang, “Show, attend and read: A simple and strong baseline for irregular text recognition,” in AAAI, 2019, pp. 8610–8617.
- X. Yue, Z. Kuang, C. Lin, H. Sun, and W. Zhang, “Robustscanner: Dynamically enhancing positional clues for robust text recognition,” in ECCV, vol. 12364, 2020, pp. 135–151.
- M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: A fast text detector with a single deep neural network,” in AAAI, 2017, pp. 4161–4167.
- M. Liao, B. Shi, and X. Bai, “Textboxes++: A single-shot oriented scene text detector,” IEEE Trans. Image Processing, vol. 27, no. 8, pp. 3676–3690, 2018.
- X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, and J. Yan, “FOTS: Fast oriented text spotting with a unified network,” in CVPR, 2018, pp. 5676–5685.
- P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, “Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” in ECCV, 2018, pp. 71–88.
- Y. Liu, H. Chen, C. Shen, T. He, L. Jin, and L. Wang, “Abcnet: Real-time scene text spotting with adaptive bezier-curve network,” in CVPR, 2020, pp. 9806–9815.
- M. Busta, L. Neumann, and J. Matas, “Deep textspotter: An end-to-end trainable scene text localization and recognition framework,” in ICCV, 2017, pp. 2223–2231.
- P. Lu, H. Wang, S. Zhu, J. Wang, X. Bai, and W. Liu, “Boundary textspotter: Toward arbitrary-shaped scene text spotting,” IEEE Trans. Image Process., vol. 31, pp. 6200–6212, 2022.
- Y. Baek, S. Shin, J. Baek, S. Park, J. Lee, D. Nam, and H. Lee, “Character region attention for text spotting,” in ECCV, vol. 12374, 2020, pp. 504–521.
- X. Zhang, Y. Su, S. Tripathi, and Z. Tu, “Text spotting transformers,” in CVPR, 2022, pp. 9509–9518.
- D. Peng, X. Wang, Y. Liu, J. Zhang, M. Huang, S. Lai, J. Li, S. Zhu, D. Lin, C. Shen, X. Bai, and L. Jin, “SPTS: single-point text spotting,” in ACM MM, 2022, pp. 4272–4281.
- M. Ye, J. Zhang, S. Zhao, J. Liu, T. Liu, B. Du, and D. Tao, “Deepsolo: Let transformer decoder with explicit points solo for text spotting,” in CVPR, 2023, pp. 19 348–19 357.
- Y. Liu, J. Zhang, D. Peng, M. Huang, X. Wang, J. Tang, C. Huang, D. Lin, C. Shen, X. Bai, and L. Jin, “SPTS v2: Single-point scene text spotting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 12, pp. 15 665–15 679, 2023.
- T. Chen, S. Saxena, L. Li, D. J. Fleet, and G. E. Hinton, “Pix2seq: A language modeling framework for object detection,” in ICLR, 2022.
- Y. Kittenplon, I. Lavi, S. Fogel, Y. Bar, R. Manmatha, and P. Perona, “Towards weakly-supervised text spotting using a multi-task transformer,” in CVPR, 2022, pp. 4594–4603.
- Y. Liu, C. Shen, L. Jin, T. He, P. Chen, C. Liu, and H. Chen, “Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 8048–8064, 2022.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in ICCV, 2021, pp. 9992–10 002.
- T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017, pp. 936–944.
- M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in AAAI, 2020, pp. 11 474–11 481.
- W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in CVPR, 2019, pp. 9336–9345.
- W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in ICCV, 2019, pp. 8439–8448.
- S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in ECCV, 2018, pp. 19–35.
- G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in CVPR, 2017, pp. 2261–2269.
- S. Peng, W. Jiang, H. Pi, X. Li, H. Bao, and X. Zhou, “Deep snake for real-time instance segmentation,” in CVPR, 2020, pp. 8530–8539.
- X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets V2: more deformable, better results,” in CVPR, 2019, pp. 9308–9316.
- P. Wang, H. Li, and C. Shen, “Towards end-to-end text spotting in natural scenes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 7266–7281, 2022.
- S. Fang, Z. Mao, H. Xie, Y. Wang, C. Yan, and Y. Zhang, “Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 7123–7141, 2023.
- Y. Sun, C. Zhang, Z. Huang, J. Liu, J. Han, and E. Ding, “Textnet: Irregular text reading from images with an end-to-end trainable network,” in ACCV, vol. 11363, 2018, pp. 83–99.
- M. Huang, Y. Liu, Z. Peng, C. Liu, D. Lin, S. Zhu, N. Yuan, K. Ding, and L. Jin, “Swintextspotter: Scene text spotting via better synergy between text detection and text recognition,” in CVPR, 2022, pp. 4583–4593.
- Shi-Xue Zhang (12 papers)
- Chun Yang (45 papers)
- Xiaobin Zhu (21 papers)
- Hongyang Zhou (6 papers)
- Hongfa Wang (29 papers)
- Xu-Cheng Yin (35 papers)